Saturday, April 07, 2018

nginx reverse-proxy tricks for jwt and csrf

I've been doing devops work over the last few months at CDIS. One of the tasks I worked on recently was to extend the nginx configuration (on github as a kubernetes configmap here) for the gen3 data commons stack.

First, we added end-user data to the proxy log by extracting user information from the JWT token in the auth cookie. We took advantage of the nginscript (nginscript is a subset of javascript) to import some standard javascript code (also in the kubernetes configmap) to parse the JWT.

Next, we added a CSRF guard that verifies that if a CSRF cookie is present, then it matches a CSRF header. The logic is a little clunky (in github here), because of the restrictions on conditionals in nginx configuration, but it basically looks like the sample configuration below - where $access_token is the JWT auth token (either from a header from non-web clients, or a cookie for web browser clients), and an "ok-SOMETHING" $csrf_check is only required for browser clients. Finally - we do not enforce the HTTP-header based CSRF check in the proxy for endpoints that may be accessed by traditional HTML form submissions (that embed the token in the form body), or for endpoints that are accessed by third party browser frontends (like jupyterhub) that may implement a CSRF guard in a different way. Finally, there's also a little bit of logic that auto-generates a CSRF cookie if it's not present - a javascript client in the same domain can read the cookie to get the value to put in the X-CSRF header.

          set $access_token "";
          set $csrf_check "ok-tokenauth";
          if ($cookie_access_token) {
              set $access_token "bearer $cookie_access_token";
              # cookie auth requires csrf check
              set $csrf_check "fail";
          if ($http_authorization) {
              # Authorization header is present - prefer that token over cookie token
              set $access_token "$http_authorization";

          # CSRF check
          # This block requires a csrftoken for all POST requests.
          if ($cookie_csrftoken = $http_x_csrf_token) {
            # this will fail further below if cookie_csrftoken is empty
            set $csrf_check "ok-$cookie_csrftoken";
          if ($request_method != "POST") {
            set $csrf_check "ok-$request_method";
          if ($cookie_access_token = "") {
            # do this again here b/c empty cookie_csrftoken == empty http_x_csrf_token - ugh  
            set $csrf_check "ok-tokenauth";
          location /index/ {
              if ($csrf_check !~ ^ok-\S.+$) {
                return 403 "failed csrf check";
              if ($cookie_csrftoken = "") {
                add_header Set-Cookie "csrftoken=$request_id$request_length$request_time$time_iso8601;Path=/";

              proxy_pass http://indexd-service/;

Thursday, November 23, 2017

Jenkins Backup Pipeline

I recently setup a Jenkins CICD service to complement the Travis based automation already in use where I work. I worked with job-based Jenkins workflows in the past - where we setup chains of interdependent jobs (ex: build, publish assets, deploy to dev environment, ...), but I took this opportunity to adopt Jenkins' new (to me) Pipeline pattern, and I'm glad I did.

A Jenkins pipeline defines (via a groovy DSL) a sequence of steps that execute together in a build under a single Jenkins job. Here's an example pipeline we use to backup our Jenkins configuration to S3 every night.


pipeline {
  agent any

  stages {
      steps {
        echo "BuildArchive $env.JENKINS_HOME"
        sh "tar cvJf backup.tar.xz --exclude '$env.JENKINS_HOME/jobs/[^/]*/builds/*' --exclude '$env.JENKINS_HOME/jobs/[^/]*/last*' --exclude '$env.JENKINS_HOME/workspace' --exclude '$env.JENKINS_HOME/war' --exclude '$env.JENKINS_HOME/jobs/[^/]*/workspace/'  $env.JENKINS_HOME"
    stage('UploadToS3') {
      steps {
        echo 'Upload to S3!'
        sh 'aws s3 cp --sse AES256 backup.tar.xz s3://cdis-terraform-state/JenkinsBackup/backup.$(date +%u).tar.xz'
    stage('Cleanup') {
      steps {
        echo 'Cleanup!'
        sh 'rm -f backup.tar.xz'
  post {
    success {
      slackSend color: 'good', message: 'Jenkins backup pipeline succeeded'
    failure {
      slackSend color: 'bad', message: 'Jenkins backup pipeline failed'
    unstable {
      slackSend color: 'bad', message: 'Jenkins backup pipeline unstable'

If you are a Jenkins user, then take the time to give Pipelines a try. If you're also using github or bitbucket - then look into Jenkins' support for organizations that nicely support pull-request based workflows. Also try the new Blue Ocean UI - it's designed with pipelines in mind.

Saturday, August 19, 2017

stuff an app into a docker image

I had some fun over the last week setting up a docker image for an S3-copy utility, and integrating it into the gulp build for I pushed the image for the s3cp app up to I use s3cp to sync a local build of up to an S3 bucket. I could probably coerce the AWS CLI (aws s3 sync) into doing the same job, but I wrote this app a while ago, and it includes functionality to automatically gzip-compress each asset before upload, and setup various HTTP headers on the asset (content-encoding, cache-control, content-type, and etag). For example - inspect the headers (using a browser's developer tools) on an arbitrary asset like

A few s3cp details - it's a simple scala application with two command line flags (-config aws-key aws-secret, -copy source destination). The '-config' command saves the AWS credential under ~/.littleware/aws/. The code could use a little love - it is hard-coded to use the AWS Virginia region, it could use a '-force' option with '-copy', and its error messages are annoying - but it works for what I need. The source is on github (git clone ...; cd littleware/webapp/littleApps/s3Copy; gradle build; gradle copyToLib; cli/ --help).

I pushed a binary build of s3cp to dockerhub. I run it like this. First, I register my AWS credentials with the app. We only need to configure s3cp once, but we'll want to mount a docker volume to save the configuration to (I need to wire up a more secure way to save and pass the AWS secrets):

docker volume create littleware

docker run -it -v littleware:/root/.littleware \
    -v /home/reuben:/mnt/reuben \
    --name s3cp --rm frickjack/s3cp:1.0.0 \
    -config aws-key aws-secret

After the configuration is done I use s3cp with '-copy' commands like this:

docker run -it -v littleware:/root/.littleware \
    -v /home/reuben:/mnt/reuben \
    --name s3cp --rm frickjack/s3cp:1.0.0 \
    -copy /mnt/reuben/Code/littleware-html5Client/build/ s3://

I added a gulp task to the code to simplify the S3 deploy - from gulpfile.js:

gulp.task( 'deploy', [ 'compileclean' ], function(cb) {
    const pwdPath = process.cwd();
    const imageName = "frickjack/s3cp:1.0.0";
    const commandStr = "yes | docker run --rm --name s3gulp -v littleware:/root/.littleware -v '" +
        pwdPath + ":/mnt/workspace' " + imageName + " -copy /mnt/workspace/build/ s3://";

    console.log( "Running: " + commandStr );

    exec( commandStr, 
        function (err, stdout, stderr) {
            if ( err ) {
                //reject( err );
            } else {

Sunday, August 13, 2017

versioning js and css with gulp-rev and gulp-revReplace

I've discussed before how I run on S3, but one issue I did not address was how to version updates, so that each visitor loads a consistent set of assets. Here's the problem.

  • on Monday I publish v1 assets to the S3 bucket behind
  • later on Monday Fred visits, and his browser caches several v1 javascript and CSS files - a.js, b.js, x.css, y.css, ...
  • on Tuesday I publish v2 assets to S3 - just changing a few files
  • on Wednesday Fred visits, but for whatever reason his browser cache updates b.js to v2, but loads v1 of the other assets from cache

On Wednesday Fred loads an inconsistent set of assets that might not work together. There are several approaches people take to avoid this problem. Surma gave a good overview of HTTP cache headers and thinking in this talk on YouTube, and Jake Archibald goes into more detail in this bLog (we set cache-headers directly on our S3 objects).

Long story short - I finally wired up my gulpfile with gulp-rev and gulp-rev-replace to add a hash to the javascript and css file names. Each visitor to is now guaranteed to load a consistent set of assets, because an asset's name changes when its content changes. I was really happy to find gulp-rev - it just takes care of things for me. The only gotcha is that gulp-rev-rewrite does not like to work with relative asset paths (ex - <script src="511.js"), so I had to update a few files to use absolute paths (src="/511/511.js") - otherwise things worked great.

Monday, August 07, 2017

use docker's overlay2 storage driver on Ubuntu 16.0+

Docker defaults to using its aufs storage driver on Ubuntu 16.0.4, but the system has a 4.4 kernel, so it's probably a good idea to switch over to the overlay2 driver.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial

$ uname -r

$ sudo cat /etc/docker/daemon.json
  "log-driver": "journald"

Monday, June 05, 2017

S3+CloudFront+ACM+Route53 = serverless http2 https site with CDN

The other night I logged into the AWS web console, and upgraded my little S3 hosted site at to http2 with TLS - which opens the door for adding a service worker to the site. As a side benefit the site is behind the CloudFront CDN. I also moved the domain's authoritative DNS to Route 53 earlier in the week - just to consolidate that functionality under AWS.

It's ridiculous how easy it was to do this upgrade - I should have done it a while ago:

  • create the Cloud Front network
  • configure the CDN with a certificate setup with the AWS Certificate Manager
  • update DNS for the domain '' to reference the CDN hostname - Route53 supports an 'alias' mechanism that exposes an A record, but you can also just use a CNAME if you have another DNS provider

Anyway - this is fun stuff to play with, and my AWS bill will still be less than $2 a month.

Monday, May 29, 2017

Content Security Policy and S3 hosted site

When I went to implement a content security policy on I was disappointed to realize that the 'Content-Security-Policy' header is not one of the standard headers supported by S3's metadata API. This bLog
explains an approach using lambda on the CloudFront edge, but that looks crazy.

Fortunately - it turns out we can add a basic security policy to a page with a meta tag, so I added a <meta http-equiv="Content-Security-Policy" ...> tag to the nunjucks template that builds's html pages at gulp compile time. The full code is on github:

<meta http-equiv="Content-Security-Policy" 
  default-src 'none'; 
  img-src 'self' data:; 
  script-src 'self'; 
  style-src 'self'; 
  object-src 'none'; 
  font-src 'self'