Aws | Daniel Schaaff

Think Twice Before Adopting EKS Auto Mode

I’ve been using AWS’ EKS service for many years. What began as a bare bones offering has slowly and steadily added more features over the years. Amazon recently announced “auto” mode for EKS clusters. In this new mode AWS manages a Karpenter installation, core addons such as coredns and vpc-cni, as well as the underlying ec2 nodes. On paper this sounds great, but as always the devil is in the details. In this instance, it is important to be aware of the ~ 10% price premium for this mode. In auto mode AWS will charge an hourly rate for the ec2 instances in addition to the standard instance charge. While some folks may find this a fair trade off, I would not choose this mode myself. I run clusters using Karpenter as the autoscaler and use bottlerocket-os for the underlying nodes. In this configuration the management overhead is extremely low. I even gave a short talk on this setup for AWS a while back. IMO, if you are already running an EKS cluster using Karpenter and Bottlerocket you have little to gain from auto mode relative to the price premium. ...

Monitoring Creation of Log Files in s3

I manage several apps that write various pieces of data to the local file system and rely on Fluentd to ship them to s3. There is solid monitoring around the fluentd aggregator process, but I wanted better visibility and alerting when things aren’t written to s3 as expected. The solution I came up with was a custom Datadog check. The files I am monitoring are written into a bucket named something like example-logs/data/event-files/year/month/day. A new path is set up in the s3 bucket for the current day’s date, e.g. logs/data/example-log/2018/08/15 each day. The Datadog check sends the count of objects in the current date’s directory as a gauge. You can then monitor that objects are created each day as expected and at a normal rate. ...

Bash Function to SSH into ec2 Instances

I’ve often found myself with an instance id that I want to login to look at something. It sucks looking up the IP when you don’t know the DNS name. I’m sure there are other ways to do this but here is what I came up with. getec2ip() { aws ec2 describe-instances --instance-ids $1 | jq [.Reservations[0].Instances[0].PrivateIpAddress] | jq --raw-output .[] } assh() { host=$(getec2ip ${1}) ssh user@${host} } This relies on the aws cli and jq to parse out the ip and has made it much easier for me to quickly hop on an instance. ...

Jenkins Dynamic EC2 Executors

There is a nice plugin for Jenkins that lets you dynamically add capacity by spinning up EC2 instances on demand and then terminating them when the job queue expires. This is a great way to save money on an AWS based build infrastructure. Unfortunately, the plugin documentation is really light and there are a few gotchas to look out for. Security Groups This field only accepts comma separated security group IDs, not names. This is frustrating because other fields in the plugin take a space separated list (e.g. labels) ...

Terraform AMI Maps

Up until today we had been using a map variable in terraform to choose our ubuntu 14 ami based on region. variable "ubuntu_amis" { description = "Mapping of Ubuntu 14.04 AMIs." default = { ap-northeast-1 = "ami-a25cffa2" ap-southeast-1 = "ami-967879c4" ap-southeast-2 = "ami-21ce8b1b" cn-north-1 = "ami-d44fd2ed" eu-central-1 = "ami-9cf9c281" eu-west-1 = "ami-664b0a11" sa-east-1 = "ami-c99518d4" us-east-1 = "ami-c135f3aa" us-gov-west-1 = "ami-91cfafb2" us-west-1 = "ami-bf3dccfb" us-west-2 = "ami-f15b5dc1" } } We would then set the ami id like so when creating an ec2 instance. ami = "${lookup(var.ubuntu_amis, var.region)}" The problem we ran into is that we now use Ubuntu 16 by default and wanted to expand the ami map to contain its ID’s as well. I quickly discovered that nested maps like the one below work. ...

OpenVPN and ec2 Jumbo Frames

While troubleshooting site to site links running OpenVPN recently I ran into an issue with MTU sizing on the ec2 end. When we originally setup the links we followed the performance tuning advice found here. The relevant portion is that we set tun-mtu 6000 Why did we do this? Here’s OpenVPN’s explanation By increasing the MTU size of the tun adapter and by disabling OpenVPN's internal fragmentation routines the throughput can be increased quite dramatically. The reason behind this is that by feeding larger packets to the OpenSSL encryption and decryption routines the performance will go up. The second advantage of not internally fragmenting packets is that this is left to the operating system and to the kernel network device drivers. For a LAN-based setup this can work, but when handling various types of remote users (road warriors, cable modem users, etc) this is not always a possibility. During later testing we discovered that we could easily push 40mb/s over the OpenVPN tunnel into the ec2 instance, but only 1mb/s or less going the opposite direction. Obviously not ideal. ...