mcwhirter.com.au/craige/blog/2014/LCA2014

These are my raw notes from talks held on Wednesday at LCA2014. May contain errors, mis-heard quotes. Also completely un-reviewed or spell checked:

Lightning Talks / Conference Open

@crowdfundfloss cffsw.modernthings.org
Slide lint for conference persentation optimisation

Discovery and Monitoring without limit using the Assimilation Project by Alan Robertson

Interesting attendees: Linus, Tridge, Jon Oxer
Zero footprint discovery
Extremely scalable monitoring

Problems Addressed

Risk Management
- Maintaining details discovery database
- Discovring forgotten systems
- Software discovery
- Monitoring services and systems
- Finding unmonitored services
- Intrusions
Why Discovery?
- Continuous

Unique Powerful Features

Continuous discovery by listening
Zero network footprint
Every change noticed
Dependency discovery
Low network load

Uniformly, fully distributed work

Monitoring and discovery are fully distributed
Reliable
Only edge conditions are centralised
Adding systems doe snot increase monitoring work
Each server monitors 2 or 4 neighbours
Each server monitors it's own services
Repair and alerting is low volume
Detects switch failure by nominating 1 server per switch for a cross switch ring.
95% of traffic stays in the same switch

Architectural Components

Collective Management Authority - per installation
Nano probes - per server
Data storage
Nanoprobe management:
- Configure and direct
- Hear alerts and discovery
- Update rings (join / leave
- Update database
- Issue alerts
Nanoprobe functions
- Announce self to CMA
- Do as CMA instructs
- No persistent state across reboots
Linux-HA Base Service Monitoring
- Local Resource Manager (LRM)
Pros:
- Simple scalable
- Uniform work distribution
- No single point of failure (cluster CMA)
- Light network load
- Multi-tenant
Cons
- Active agents
- Potential slow startup at power on (for large numbers of machines
Why a Graph DB
- Humans describe things as graphs
- Dependency and Discovery is fundamentally a graph
- Speed of graph depends on size of sub graph, not total graph
- Natural visualisation
- Schema-less design: good for heterogeneous env.
- Graph model == object model

Discovery API

Scripts perform discovery with JSON output
Three sample discovery snippets
- OS Information
- Service discovery
- Client discovery
Service discovery is brilliant.

Current Status

Released in April 2013
Nanoprobe is functional
Need adopters

Linception: Playing with containers under linux by Jay Coles

Checkoint / Restore
Otherwise standard Linux
Namespaces
- Allows granularity
- Presents a subsite of host resources
- Allows picking and chosing components
- Not everything is namespace aware
setns allows you to enter a namespace
- no need to ssh into a namespace
Veth is a virtual ethernet pipe
Containers need mulitlayer security defenses - no one tool currently provides what's reuiqred.
LXC is worth looking at. Docker is built on LXC

Building Effective Alliances around the Trans-Pacific Partnership Agreement by sky croeser

TPP impacts domestic legislative capabilities
There's a lack of transparency
Restrictive intellectual property impacts
Potentially effects access to affordable medicine
ISP's to monitor IP infringements
Corporations more able to sue the state for laws that impact them
Creates an infringement of national sovreignty

What Can You Do?

Bring attention to the TPPA
Make a submission to DFAT on the TPP.

Political Landscape

Greens and Pirate Party are opposed, actively, calling for transparency
The ALP appesr to be against ISDS despite previosly negotiating
The Nationals are giving hints of disquiet
The Liberals claim TPPA will be good for industry

Groups with a Tech Focus

Australian Digital Alliance
Electronic Frontiers Australia

Broader Coalitions

AFTINET
Choice Australia
ACTU
Public Health Association Australia
MADGE Australia
Environmental activism

Draw on other strategies

Utilise Beautiful Trouble
Consider and map your spectrum of allies
Shift the discourse
Tactics that welcome participation
- Allow for tiered participation
- Facilitate any one who want to do something to be able to do something
- Ensure compelling frameworks
- Think strategically about how you frame it
- Think about your orgs structure
- Avoid burnout - keep a balanced life

Direct Action

Bring about the change you want to see
Can gain visibility for negotiations

Internal Strategies

Inequality opens opportunity to split internation consent
Link with civil society in other nations
Campaign needs to be funded and resourced

Hierarchical infrastructure description for your system management needs by Martin Krafft

Approaches:
- Cloud provisioning
- Traditional system administration
Targeting of nodes and classifying nodes
Configuration managemetn vs monitoring
- Keep thme seperate
Parametrisation - word of the conference
Parametrise your automation system
Define data in one place
RECLASS will merge it
Currently on YAML_FS
Multiple inheretence
Adapters interface between configuration management and reclass
- CLI switches
- output in YAML / JSON
- Ansible and SALT are supported currently
- SALT integration is now via a SALT module (better performance)
- Provides inventory information for Ansible
Future work
- Logging framework
- Membership lists
- Tests
- Disk caching
- Long running process

Finding signal in the monitoring noise with Flapjack by Lindsay Holmwood and Jesse Reynolds

Composable(?)
Rollup - alert summarisation
Alert routing
Does three things
Receives an event
- notify
- who?
- how?
API (RESTful JSON)
No restarts required
Bulletproof use it anger, two developers paid to work on it.
Ruby, Redis, EventMachine based.
Designed for humans
- Considers alert fatigue
- Normalcy bias
- Confirmation bias

Why?

Multi-tenant support
Segregated responsibility
Check engine independence (event producers)
Self-checking with oobetet
Rollup - alter sumerisation
Contacts store media type (email, SMS etc), sets summary thresholds, entities, checks and history
Hooks up to Google Hangouts / Jabber media types
Tagging can be used for grouping
No hard/soft states
Nagios / Icinga used as a dumb alert checker (only configure check execution)
Allows scaling

@Bulletproof

Process >~60 events/second
Manage - a customer portal for managing their own notification rules
manage-flapjack-sync does what it says :-)

Shortcomings

30s fixed broadcast delay (why?)
Assumes single external source of truth (puppet, CRM, Ansible via API - needs to be written)
Contacts need to import and exported from an external source (what sources? - what ever sources you write for)

Other Features:

Release planning is public
As is bug tracking
Semantic versioning
Write/run tests (unit and integration)
.deb .rpm packages provided
Solid documentation available
A bad first experience is considered a bug

Live upgrading many thousands of servers from an ancient RedHat 7.1 to a 10 year newer Debian by Marc MERLIN

Slides
Use file leve syncing, handle exceptions via configuration managers
Google wrote a custom-rsync
Root partition is the same on all servers
Ran RedHat 7.1 for over 10 years and wanted to upgrade without reboot (were the machines that old too?)
On all Google's production machines
At Google's scale, maintaining your own distro based on Debian makes a lotlof sense
File level syncing recovers from state and is more reliable
Forcing service to not write on the rootFS helps with distro switches