Useful Shit

Stuff in here is just a random collection of things that I've picked up along the way, inspired by Reddit's r/til.

Think Bash one-liners and useful CLI commands for various things, structured in a way that makes it easy for me to search for when I need it.

Musings

Just collection of random thoughts and longer form pieces of writing.

Preamble

May 27th, 2017

I recently started on a new team at Square (the #traffic) team. Part of the reason for the switch was getting out of my comfort zone a little, to focus on an area of software engineering other than Data, which I've been doing for the last 2.5 years or so.

Starting on a new team, with new co-workers and new projects presented me with a great opportunity to finally get around to hacking together a blog. It's something I've been meaning to do for a while now.

This is also a chance for me to play around with some Scala. I work primarily in Java and Ruby at Square, but Scala has been on my list of languages to play around with for a while. This blog is written using the Play framework (2.5.x), and I'm using Google Cloud Platform to host it.

I'm going to try and hold myself to one entry a (working) day, with each entry being a short writeup about something that I encountered during the day, or something I've come across while writing these posts with Scala, Play and GCP.

Let's see how we go!

Up and running!

May 27th, 2017

It took me a little while, but I finally got this thing up and running! Hello, world!?

I thought building a site was going to be easy. Sure, it's simple to build something to host static pages built on top of WordPress in PHP, but that's so mid-2000's LAMP anyone?!). I was up for more of a challenge when I started building this, so I picked some building blocks that satisfied the following:

  • (Relative) Simplicity - I needed to be able to hack something together quickly and be able to iterate on it easily without my house of cards falling over

  • Familiarity - I wanted to write some code in a language that have a passing familiarity with

  • Experimental - part of putting this together was an excuse to pick up some new tools and concepts that I don't have a chance to work with in my day-job

  • Open source - there's nothing worse that not being able to crack something open and see how it works. All the buidling blocks of the site need to be available to anyone else to pick up and use

Backend

A site (usually) needs a webserver backing it. Something to serve up the content to the clients that connect to it. While the initial goal is to serve up static content, I wanted to use somthing that was going to be a little more versatile and flexible if I wanted to experiment with something fancier in the future.

I use Java at work, so I'm confident working in the JVM ecosystem. The tooling and support is pretty extensive and mature, and it's easy to StackOverflow your way out of a problem if / when you get stuck. I've been learning Scala on the side for a while as an excuse to get my head out of the "everything is a noun" approach to software development in Java this if you haven't already), so wanted to get my teeth sunk into a decent project.

Googling "Scala" and "web development" will probably lead you to the Play framework. It's pretty easy to use if you're like me and have used another MVC framework like Rails. Given that you're writing Scala, it's all statically typed, incuding the HTML templates. Less of those gross runtime errors.

I've recently started doing some Golang at work, so maybe if I get sick of Play I might have a play around with porting some of this to Go. Who says I can't be irresponsible and do a re-write when I want to?!

I've slapped Nginx in front of the backends as a reverse proxy. It's 2017 and it's better than Apache. Sure, I'm not serving up thousands of requests a second, but we use it at work so I know a thing or two about how to configure it. It can cache things nicely, and terminate SSL (when I get around to it!).

Frontend

While HTML was probably the "language" that got me into programming at an early age, building small web pages that I could run on the computer I'd built from parts I'd picked up at garage sales in the small country town in which I grew up (represent!), it's never really been a strong point. Javascript is gross. #sorrynotsorry.

Every week there's a new framework on the Orange Website that all the cool kids are using. It's exhausting to keep up with. So I'm sticking to my roots here and the content for the site is nothing fancy. It's just static HTML with some CSS to make things look pretty.

Given the backend is Play / Scala, maybe I'll get around to checking out something like scala.js ... maybe.

Hosting

I've always been a bit of a Google fan-boy, and have used GCP for a while now, both at work and for various side projects, so this was a pretty easy choice. That said, you could probably throw this site up on AWS or Digial Ocean pretty easily.

I like how easy and fast it is to iterate on GCP relative to the competition. I'm slowly coming around to the idea of "immutable infrastructure", where you can treat your VMs as immutable, expendable and ephemeral. If something stops working, spin up a replacement and keep moving. GCE is pretty good for this, and you can have an instance up and SSH-able in less than a minute.

Deployment

Ask any hipster what they're using these days for deploying their artisinal, hand-rolled, functional, stateless, reactive microservices and they'll probably drop the C-word. "Containers" are pretty much synonomous with Docker these days (although obviously there are others - see LXC ad and Rkt for two notable examples). Using Docker containers unlocks cool frameworks like Kubernetes for service discovery and deployment. We've got a pretty similar framework at Square, p2 that is heavily inspired by Kubernetes, so the concepts and abstractions are familiar to me, which is a bonus.

Google has a hosted Kubernetes service called Google Container Engine (GKE) which abstracts away the pain of adding and removing VMs from your cluster and getting them talking to one another.

Kubernetes is another tool I can add to the tool-belt. A bunch of companies are already using it (probably in production, but maybe they'd be reluctant to admit that openly). The open source community is huge, and the project is constantly evolving. No doubt things will look different in a month's time. There will be different abstractions to use. Hey - if I ever end up with a database backing this thing (god forbid), I'll be keen to checkout how Kubernetes deals with StatefulSets.

So ... here we are. We've got the broad strokes of a website engraved in a single HTML document, checked into a Github repo, building and deploying on GCP. I know it works coz you're reading it, and I know a thing or two from a past life on how to scrape and analyze web logs to tell what kind of browser you're reading it from and where you are! ... or at least where the IP of your Tor exit node appears to be.

Stay tuned ... my creative juices have to refill. Writing is hard.

-nt

Kubernetes and SSL

June 4th, 2017

In 2017 it's a little scary to have to browse to a website that isn't "secure". By secure I mean that it's not using HTTPS to encrypt the traffic to and from the client (your browser) to the server (some boxes in a datacenter somewhere). Even for sites like this, which are serving purely static content, it's enough to get your side pushed down in search engine rankings Not that I'm one for rankings, but I spent part of my weekend working out how I'd go about getting my site (more) secure as an exercise in familiarising oneself with some of the new Kubernetes features.

SSL Certificates via Let's Encrypt

Let's Encrypt describe themselves as a free and automated Certificate Authority (or CA). They can generate certificates for you which you stick in your webservers to ensure that you get that nice little "Secure" label next to your site's URLs in Chrome.

While I could roll my own automation for certificate generation via Let's Encrypt and renew them periodically (certificates are valid for 90 days), I'm using Kubernetes to manage the deployment of my site, so it makes sense to see what out there already and make use of someone else's hard work.

The top search result for "kubernetes letsencrypt" is a Github project from a dude called Kelsey Hightower. Kelsey works for Google and does ton of awesome work in the Kubernetes. His kube-cert-manager adds some extra functionality to your Kube cluster that lets you manage your Let's Encrypt certificates. The README is a good place to start, but I've summarized the main parts.

The following sections are adapted from Kelsey's repo. Full props to him.

Setup

Create a third-party resource that models a "Certificate" resource:

$ kubectl create -f certificate.yaml

Where the contents of the YAML is as follows:

apiVersion: extensions/v1beta1
kind: ThirdPartyResource
description: "A specification of a Let's Encrypt Certificate to manage."
metadata:
  name: "certificate.stable.hightower.com"
versions:
  - name: v1
</code></pre>

You'll need a persistent disk resource to store the certificates that the tool generates:

$ gcloud compute disks create kube-cert-manager --size 10GB

The cert manager also needs to be able to create / delete DNS records for your GCP project. This allows you to prove to Let's Encrypt that you're the one in control of the site. More on that process (the ACME protocol) here. Create a new service account that has admin access to DNS for your project. Save that somewhere locally.

Create a secret in your Kubernetes cluster from this secret:

# Create
$ kubectl create secret generic your-site-dns \
  --from-file=path/to/your/service-account.json

# Verify
$ kubectl describe secret nicktrav-site-dns

Deploy

The cert manager can now be deployed with the following:

$ kubectl create -f kube-cert-manager.yaml

Where the content of the file is as follows:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: kube-cert-manager
  name: kube-cert-manager
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: kube-cert-manager
      name: kube-cert-manager
    spec:
      containers:
        - name: kube-cert-manager
          image: gcr.io/hightowerlabs/kube-cert-manager:0.5.0
          imagePullPolicy: Always
          args:
            - "-data-dir=/var/lib/cert-manager"
            - "-acme-url=https://acme-v01.api.letsencrypt.org/directory"
            - "-sync-interval=30"
          volumeMounts:
            - name: data
              mountPath: /var/lib/cert-manager
        - name: kubectl
          image: gcr.io/google_containers/hyperkube:v1.5.2
          command:
            - "/hyperkube"
          args:
            - "kubectl"
            - "proxy"
      volumes:
        - name: "data"
          emptyDir: {}

Watch the deployment of this pod with:

$ kubectl describe pod kube-cert-manager

One this is up and running, you're good to start generating certificate resources.

Generate

Certificate resources can be created by moulding the following to suit your requirements. Place in a file (in this case your-site-dot-com.yaml).

apiVersion: "stable.hightower.com/v1"
kind: "Certificate"
metadata:
  name: "something-descriptive"
spec:
  domain: "your.domain.com"
  email: "you@email.com"
  provider: "googledns"
  secret: "service-account-secret"  # This was named your-site-dns above
  secretKey: "service-account.json"

The most important part is the domain. In my case, I have my main site, as well as a staging domain, so I created a certificate for each (stage.nicktrave.rs as well as nicktrave.rs). Note that you could probably wildcard your domain you don't have to generate as many certificates. I haven't experimented with this.

Generate the certificates:

$ kubectl create -f your-site-dot-com.yaml

At this point if you tail the logs of the kube-cert-manager pod you'll see that it's requesting certificates for you from Let's Encrypt. Part of this process (as alluded to earlier) is to add some DNS records so that Let's Encrypt can verify that it's actually you making these requests for certs. Given that you have control of your site's DNS, Let's Encrypt will respond to a request for a certificate by asking you to create a DNS record. You go ahead and do this, Let's Encrypt checks that the DNS records it asked for have been created, and then goes ahead and sends you the cert.

Confirm

If kube-cert-manager was successful, it will create a new secret for each domain that you requested. You can list the certificates and secrets via:

$ kubectl get certificates
$ kubectl get secrets

Inspecting each secret you'll notice that there's a .key and .crt file inside. These are what you'll provide the load balancer for it to set up SSL termination.

TLS Termination

By far the easiest way of securing your site that runs on GCP is to place the certificates in Google's L7 load balancer. If you're running on Kubernetes, you can use an Ingress resource resource to manage this for you. The details, with extensive examples can be found here, but the TLDR is that you need to create a new Deployment, which you expose via Service. Then you use an Ingress to tell Google's LBs about it. The Ingress also defines the certificate that you want to use, as well as the hostnames that it will cover. Here's an example:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-your-site
  annotations:
    kubernetes.io/ingress.global-static-ip-name: my-site
spec:
  tls:
   - secretName: your.site.com
  backend:
    serviceName: service-for-your-site
    servicePort: 80

Note the extra annotation there that tells Kubernetes that I want this Ingress to be exposed to the outside world via a static global IP. I've got an additional A-record in my DNS entries that resolve to this IP.

After creating the Ingress, it takes a while for it to create the entries in the load balancers. The load balancers use health-checks to check to see that your backends are healthy. You can see these under Compute / Health Checks.

And voila! Now your site should be talking HTTPS. Depending on how your backends are set up, you'll probably be handling HTTP too. More on how to deal with that now.

Extras

While my site uses HTTPS now, that's only the case if you explicitly ask to speak HTTPs. If a client was to ask for the same content from a http:// resource, the backend would still serve it up to them. Because I'm using Nginx as a reverse-proxy in front of my backends, there's a nice little trick you can use to tell any request for http content to redirect to https.

Place the following in your nginx.conf:

if ($http_x_forwarded_proto = "http") {
  return 301 https://$host$request_uri;
}

This works because Google's load balancers set the X-forwarded-proto header on each request. Nginx will examine these headers and upon seeing HTTP as the request protocol it responds to the load balancer with a 301 moved permanently. This is considered to be a best practice for upgrading users from HTTP to HTTPS.

Next steps

While traffic to and from site is now much more secure and clients know they are talking to servers who are who they say their are, my site still isn't as secure as I'd like it to be. Traffic is only encrypted up to Google's load balancer. While I have ample faith in Google's ability to handle my boring content traffic within their own datacenters, an even safer solution would be to encrypt traffic between the LBs and my webservers (nginx). Traffic from the webserver to the application server is, in my case, on the same VM, so it's less important for it to be encrypted for the last stage of its journey.

I made a fruitless attempt to get the LBs to talk to my backends using HTTPS, but it looks like this is a shortcoming.

While Google's load balancers are easy to use, they are also pretty expensive (roughly $20 a month for a single rule). An alternative approach would be to place an nginx-ingress-controller in front of your services, which obviates the need for Google's Load Balancer altogether. That's some more material for another post.

ConfigMaps, Secrets and InitContainers

January 26th, 2019

New Year (kind of), new job, and another chance to say that I'm going to be serious about writing longer form pieces of content for my site. The new job part is probably the biggest motivation, as I'm going to be working extensively with scalable cloud infrastructure built out on top of k8s. This which serves as a great opportunity to write about what I'm learning along the way. Fun!

This post is going to detail a problem I had to work around this week related to configuration changes and version control.

Specifically, the problem I was faced with was in a project that had a directory structure like the following:

repo/
  config-file1.ini
  config-file2.ini
  deployment.yaml

The config-file*.ini files looked like the following:

...

[database]
password = changeme
host = localhost

...

The deployment.yaml file represents a typical k8s Deployment. This case, the end result is simply printing out the contents of the configuration files, to prove that we can see everything that we need to run our app (i.e. the passwords):

apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    spec:
      containers:
      - name: container-1
        image: busybox
        command: ['sh', '-c', 'cat /etc/config/*; sleep 100']
        volumeMounts:
        - name: secrets
          mountPath: /etc/config
          readOnly: true
      volumes:
        - name: secrets
          secret:
            secretName: passwords
            items:
            - key: config-file-1
              path: config1.ini
            - key: config-file-2
              path: config2.ini

The config files contained secret information that couldn't be checked in (obviously), but it still had to be accessible to the k8s cluster. A Secret was used to store the contents of the .ini files in their entirety. Herein lies the problem!

One of the benefits (in theory) of having configuration checked into version control, and having a solid CI pipeline is that when someone changes a configuration file, a pipeline is triggered to build and redeploy everything. In our case, this is done with Helm and Jenkins.

Unfortunately, the Secret had been created manually, using something like the following:

apiVersion: v1
kind: Secret
metadata:
  name: my-secret
stringData:
  config-file-1: |
   ...
   [database]
   password = real-password-1
   host = localhost
   ...
  config-file-2: |
   ...
   [database]
   password = real-password-2
   host = localhost
   ...
type: Opaque

The password had been put into place for the purposes of creating the Secret initially, and this had been stored in the cluster. The file had then been deleted.

The problem arose when trying to alter some other configuration in the .ini files (for example, changing the hostname for the database), and assuming that the CI pipeline would push these changes out into the cluster. Given that the configuration was being mapped into the containers from the static Secret, getting the configuration change reflected would mean updating the Secret manually, creating a new yaml file, like before and applying the change to the cluster. Not ideal.

Here's a little solution I came up with instead, that at its core, relies on a ConfigMap for the configuration template, a Secret for the passwords, and an InitContainer to take the template and the passwords and populate a file that could be used by the main container. Easy! ... That said, there were some gotchas along the way though that I want to point out too.

A minimal Secret for the passwords

The only "secret" information that is contained in the .ini files is the password, so it makes more sense to make a Secret that contains just the password.

Here's the config:

apiVersion: v1
kind: Secret
metadata:
  name: passwords
stringData:
  password1: real-password-1
  password2: real-password-2
type: Opaque

These were then created in the cluster, and the files subsequently deleted, like before.

Note that this definitely isn't best practice in terms of security. There are safer and more reliable ways of creating secrets that don't rely on generating files locally and storing them on disk temporarily. That's outside the scope of this post though.

ConfigMaps for configuration templates

With the passwords now in their own dedicated Secret, we can move the configuration files out of the existing Secret and into a ConfigMap, which ended up looking something like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: templates
data:
  config-1: |
    [database]
    password = PASSWORD
    host = localhost
  config-2: |
    [database]
    password = PASSWORD
    host = localhost

Note that in this example, we end up no longer requiring the .ini files, opting to have this configuration moved directly into the ConfigMap. In our setup we're using Helm, which allows us to use their templating language to inline the contents of files directly into the yaml, so we can still have the .ini files. I'll leave that for another post.

Volumes

The eventual goal is to be able to write the passwords from the Secret into the templates contained in the ConfigMap, which are mounted into the main container.

Here's a first pass at the deploment.yaml for this:

apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    spec:
      containers:
      - name: container-1
        image: busybox
        command: ['sh', '-c', 'cat /etc/config/*; sleep 100']
        volumeMounts:
        - name: templates
          mountPath: /etc/templates
          readOnly: true
        - name: secrets
          mountPath: /etc/secrets
          readOnly: true
      volumes:
        - name: templates
          configMap:
            name: templates
            items:
            - key: config-1
              path: config1.ini
            - key: config-2
              path: config2.ini
        - name: secrets
          secret:
            secretName: passwords

Even though we have the templates and passwords mounted into the container, we're not actually doing anything with them here. We still need to combine them. This is where the InitContainers are useful.

InitContainers

InitContainers allow you to start one or more containers before the main containers start that can do some kind of setup. This might be setting an environment variable, or blocking on some condition before allowing the main containers to launch. In our use-case, we're going to use them to put the password into the templates.

Here's what my second pass looked like:

apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    spec:
      - name: init-password-1
        image: busybox
        command: ['sh', '-c', 'sed -i "s/PASSWORD/$(cat /etc/secrets/password-1)/" /etc/config/config1.ini']
        volumeMounts:
        - name: templates
          mountPath: /etc/config
          readOnly: false
        - name: secrets
          mountPath: /etc/secrets
          readOnly: true
      - name: init-password-2
        image: busybox
        command: ['sh', '-c', 'sed -i "s/PASSWORD/$(cat /etc/secrets/password-2)/" /etc/config/config2.ini']
        volumeMounts:
        - name: templates
          mountPath: /etc/templates
          readOnly: false
        - name: secrets
          mountPath: /etc/secrets
          readOnly: true
      containers:
      - name: container-1
        image: busybox
        command: ['sh', '-c', 'cat /etc/config/*; sleep 100']
        volumeMounts:
        - name: templates
          mountPath: /etc/templates
          readOnly: true
      volumes:
        - name: templates
          configMap:
            name: templates
            items:
            - key: config-1
              path: config1.ini
            - key: config-2
              path: config2.ini
        - name: secrets
          secret:
            secretName: passwords

We've used two InitContainers that mount in the Secret as read only into /etc/secrets, and the ConfigMap as read-write into /etc/config/ and then inline the password. The same ConfigMap is then mounted into the main container at /etc/config, and the container reads the contents of the updated config files. Great!

Unfortunately, there's a subtle problem in that the ConfigMap that is mounted into the InitContainers isn't actually read-write, even though we've asked for it to be. This makes sense, given that it would be weird for the container to make changes to the contents of the underlying ConfigMap. Are those changes reflected in the map that's stored in the cluster?, etc., etc. The same read only constraints apply to Secrets.

There's a nice explanation for it here.

emptyDir volumes

With the ConfigMap and Secret being read-only mounts, we need a way to generate the configuration and persist that somewhere temporarily and make that accessible to the main container. We can use an emptyDir volume for that!

Here's what the final configuration looked like:

apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    spec:
      initContainers:
      - name: init-password-1
        image: busybox
        command: ['sh', '-c', 'sed "s/PASSWORD/$(cat /etc/secrets/password-1)/" /etc/templates/config1.ini.tmpl > /etc/config/config1.ini']
        volumeMounts:
        - name: templates
          mountPath: /etc/templates
          readOnly: true
        - name: secrets
          mountPath: /etc/secrets
          readOnly: true
        - name: configs
          mountPath: /etc/config
          readOnly: false
      - name: init-password-2
        image: busybox
        command: ['sh', '-c', 'sed "s/PASSWORD/$(cat /etc/secrets/password-2)/" /etc/templates/config2.ini.tmpl > /etc/config/config2.ini']
        volumeMounts:
        - name: templates
          mountPath: /etc/templates
          readOnly: true
        - name: secrets
          mountPath: /etc/secrets
          readOnly: true
        - name: configs
          mountPath: /etc/config
          readOnly: false
      containers:
      - name: container-1
        image: busybox
        command: ['sh', '-c', 'cat /etc/config/*; sleep 100']
        volumeMounts:
        - name: configs
          mountPath: /etc/config
          readOnly: true
      volumes:
        - name: templates
          configMap:
            name: templates
            items:
            - key: config-1
              path: config1.ini.tmpl
            - key: config-2
              path: config2.ini.tmpl
        - name: secrets
          secret:
            secretName: passwords
        - name: configs
          emptyDir: {}

We're now mounting the emptyDir volume into each of the InitContainers as read-write and inlining the passwords and into templates and persisting that into the emptyDir, which is then made accessible to the main container. The container only has read-only access to the final configuration file, so there's no chance it can try and alter the contents once they are written.

Here's the final output, that proves we wired it all up correctly:

$ kubectl get pods
NAME                            READY     STATUS    RESTARTS   AGE
deployment-6d8db67956-4vnzr   1/1       Running   1          1m

$ kubectl logs deployment-6d8db67956-4vnzr
[database]
password = real-password-1
host = localhost
[database]
password = real-password-2
host = localhost

And done!

Using tmpfs for secrets

I mentioned as an aside above that it's usually not the best idea to persist passwords or key material to disk, and that's what we're doing here. That said, if you're using an emptyDir, we can tell it to use an in-memory tmpfs as the storage medium, which is much safer. To do that, we alter the volume definition in the Deployment as follows:

volumes:
...
- name: configs
  emptyDir:
    medium: Memory

And prove to ourselves that it worked:

$ kubectl get pods
NAME                            READY     STATUS    RESTARTS   AGE
deployment-5-6f7c6c7787-95vtr   1/1       Running   1          1m

$ kubectl exec -it deployment-5-6f7c6c7787-95vtr -- /bin/sh -c "df -h | grep '/etc/config'"
tmpfs                     1.8G      8.0K      1.8G   0% /etc/config

Summary

So that's how we can use ConfigMaps, Secrets and InitContainers to enable us to combine secret information such as passwords with static configuration that is checked into version control and ensuring that everything is updated when it changes, rather than manually updating Secrets.