Monitoring Creation of Log Files in s3

I manage several apps that write various pieces of data to the local file system and rely on Fluentd to ship them to s3. There is solid monitoring around the fluentd aggregator process, but I wanted better visibility and alerting when things aren’t written to s3 as expected.

The solution I came up with was a custom Datadog check. The files I am monitoring are written into a bucket named something like example-logs/data/event-files/year/month/day. A new path is set up in the s3 bucket for the current day’s date, e.g. logs/data/example-log/2018/08/15 each day. The Datadog check sends the count of objects in the current date’s directory as a gauge. You can then monitor that objects are created each day as expected and at a normal rate.

Here is an example config

init_config:

instances:
# this will monitor s3://example-logs/data/production/event-log/<year>/<month>/<day>
  - bucket: example-logs
    prefix: data/production/event-log
    tags:
      - 'env:production'
      - 'log:event-log'
  - bucket: example-logs
    prefix: data/staging/event-log
    tags:
      - 'env:staging'
      - 'log:event-log'

The check will add the current date path to the prefix automatically.

How to Set it Up for Yourself

  • Install boto3 via the Datadog embedded pip
/opt/datadog-agent/embedded/bin/pip install boto3
  • add s3_object_count.py to /etc/datadog-agent/checks.d
  • add your config file to /etc/datadog-agent/conf.d/s3_object_count.d

The code for the check is pretty simple.

See https://github.com/dschaaff/datadog-checks for the full source.

Quick Tip: Using [skip-ci] in Jenkins’ Declarative Pipelines

You’ve spent the past hour meticulously crafting a Readme update and its time to commit. Great, but what if you don’t want that commit to trigger automated testing, deploys, and other actions? If you’re using Jenkins declarative pipelines there’s a pretty simple solution.

Add the below when block to each stage you wish to skip in your Jenkinsfile.

when {
    not {
        changelog '\\[skip-ci\\]'
    }
}

We can also expand upon this for other actions if desired. For example, use the following around a deploy stage to avoid deploying pull requests, anything not on the master branch, and to respect the [skip-ci] param in a commit message.

stage('Deploy'){
    when {
        allOf {
            branch 'master'
            not {
                anyOf {
                    changeRequest author: '', authorDisplayName: '', authorEmail: '', branch: '', fork: '', id: '', target: '', title: '', url: ''
                changelog '\\[skip-ci\\]'
            }
          } 
       }
}
steps {
    deploy()
    }
}

Better Jenkins Notifications in Declarative Pipelines

I’ve been using declarative pipelines in Jenkins for a while with the Slack plugin to send build notifications to Slack. The plugin does what it says on the tin but gives you a pretty boring message by default.

E8F315D1-04A6-4DC4-B0D8-1E1E7ED42D08.png

I used the environment variables available in the pipeline to make things a little bit better and link back to the job.

08A97422-A7E2-4AB5-A65B-68EF7B5AE196.png

But I was still always disappointed the notifications didn’t contain more information. Thankfully version 2.3 of the plugin added support for the attachments portion of the Slack message API. I was able to leverage the attachments feature to get better message formatting. Meanwhile, I took some inspiration from this thread to incorporate test result summaries.

I store this in a shared pipeline library to avoid repeating the code in every Jenkinsfile. This way you can simply call it in a post step like this.

post {
always {
notifySlack currentBuild.result
}
}

Here is the code in the pipeline library

The end result is a much more informative message.

Screen Shot 2018-02-09 at 11.22.37 PM.png

 

Exporting Pagerduty Incident Data

Pagerduty provides a built-in way to export your incident data but only a limited number of data fields are available on the basic plan. Rather than upgrade you can use the API to export the data to a CSV. I found this gist listed here. The python script works great but some of my incident messages contained JSON data that threw off Excel when opening the CSV. I slightly modified the script with character escaping to work around this (lines 98- 107).

Bash Function to SSH into ec2 Instances

I’ve often found myself with an instance id that I want to login to look at something. It sucks looking up the IP when you don’t know the DNS name. I’m sure there are other ways to do this but here is what I came up with.


getec2ip() {
 aws ec2 describe-instances --instance-ids $1 | jq [.Reservations[0].Instances[0].PrivateIpAddress] | jq --raw-output .[]
}

assh() {
 host=$(getec2ip ${1})
 ssh user@${host}
}

This relies on the aws cli and jq to parse out the ip and has made it much easier for me to quickly hop on an instance.

Jenkins Dynamic EC2 Slaves

There is a nice plugin for Jenkins that lets you dynamically add capacity by spinning up EC2 instances on demand and then terminating them when the job queue expires. This is a great way to save money on an AWS based build infrastructure.

Unfortunately, the plugin documentation is really light and there are a few gotchas to look out for.

Security Groups

This field only accepts comma separated security group IDs, not names. This is frustrating because other fields in the plugin take a space separated list (e.g. labels)

Running in VPC

If you’re a sane person you’re going to want to run these instances in a private VPC. This is entirely possible but is hidden in the advanced settings. If you expand the advanced settings you’ll see a field to enter your desired subnet ID. Set this to the ID of the private subnet in your VPC you want the instances to run in.

Don’t Rely On the User Data/Init Scrip to Install Dependencies

This adds a lot of time to the instance coming on line and being usable by Jenkins. A better approach is to make an AMI with all the build dependencies you need. The only delay is then the instance boot time.

This is far from an exhaustive walkthrough but highlights the issues I ran into setting it up.