Ops Roadmap for Learn.co
Published Thursday, March 22, 2018
Notes from Flatiron School engineering team’s code reading on future ops work.
Outline
- Review current setup
- Share upcoming challenges/priorities
- Share roadmap + new concepts/tools
Priorities
- Ensure new collaborative ventures are successful
- Support our team as we grow (make ops more automated + manageable)
Requirements
- Move to AWS for hosting
- Need high amounts of infrastructure + environment automation and orchestration (Terraform)
- Scaling
- Security
- Lower maintenance costs as our team grows
Current Setup
- Hosted on Digital Ocean
- Self-hosted services:
- Postgres
- Redis
- Elasticsearch
- Memcached
- Pushstream
- Our virtual servers are on private network in DO region
Pain points
- Communication between services is not automated (no robust tooling available)
- Our servers are “pets not cattle”
- High maintenance costs
- Lots of outages
- Infrastructure is not self-healing (no robust tooling available)
- Low security
- Noisy alerts (Nagios)
- Relying on manageable (aka more brittle) deployment and provisioning processes (Chef)
- Our virtual servers are on shared machines, so vulnerable to leaks / attacks
Roadmap
Security
Guiding principle: Principle of Least Privilege (limit surface area / attack vectors)
- GitHub 2FA
- Remove root AWS keys
- Secrets encryption through AWS KMS (in progress)
- VPN for local development
- Migrate to AWS Virtual Private Cloud
More on AWS Virtual Private Cloud
- Public and Private subnets
- Services that don’t need to be exposed to internet (redis, etc.) will live in private subnet
- NAT Gateway rules to manage traffic
Scaling
All about automation
- Managed services instead of self-hosting
- Migrate DNS from Dyn to AWS Route 53
- Terraform for “Infrastructure as Code” orchestration automation
-
Packer for automated AMI builds (images for Amazon instances)
- Additional things we’re thinking about:
- Deployments
- Alerting
- Monitoring
- Logs
- Containerization / Kubernetes (way down the road)
More about Terraform
Infrastructure as code: automates your environment to match your config file (declarative code)
- Source controlled code
- Reduces documentation (self-documenting system)
- Support for multiple cloud providers
Next steps
- Port Redis (tested)
- Port workers and SQS/Rabbit (spike in progress)