Sunday, June 30, 2019

Hydrate an AWS Account

TL;DR

So you just got your brand spankin' new AWS account, and your team is launching like a rocket into the cloud. How should you set this account thing up? Before playing with EC2 instances and S3 buckets and all the other toys at the application layer of the stack; you should figure out how you want to authenticate, authorize, and monitor users manipulating your base cloud infrastructure.

The following process describes one approach to inflating an AWS account before beginning application development. Although the steps taken here are only suitable for a single account setup for an individual or small team, the approach (setting up authentication, roles for authorization, monitoring, a simple budget, and basic alerts) generalizes for use with a multi-account AWS organization.

The Plan

Here's what we're going to do

  • login as the account's root user, and setup an IAM bootstrap user with admin privileges, so we can acquire credentials to run through a suite of account-bootstrap scripts.

  • run an accountBootstrap script to setup some basic infrastructure for deploying cloudformation stacks.

  • deploy a series of cloudformation stacks

    • IAM groups and roles for:
      • administrators - with write access to IAM and cloudtrail as well as operator permissions
      • operators - with write access to whatever services you want available for use in the account, except only read access to cloudtrail and IAM
      • developers - with access required to develop, test, deploy, and monitor applications that use infrastructure put into place by administrators and operators
    • an SNS topic for publishing alert notifications
    • a cloudtrail that logs updates to your account
    • a set of cloudwatch alarms that publish notifications to administrators (via the SNS topic described above) when various events occur:
      • IAM policy changes
      • root account access
      • budget limit exceeded
      • guard duty event
  • finally - setup an initial administrator account using the new infrastructure, and delete the temporary bootstrap user

Setup an AWS bootstrap user

Login to the root account, and do the following:

  • enable MFA on the root account
  • setup a bootstrap IAM user with MFA enabled and admin privileges:
{
    "Effect": "Allow",
    "Action": "*",
    "Resource": "*",
    "Condition": {
        "Bool": {
            "aws:MultiFactorAuthPresent": "true"
        }
    }
}
  • download the access key pair for the new account, and setup ~/.aws/credentials and ~/.aws/config - ex:
[profile bootstrap-ohio]
region = us-east-2
output = json
mfa_serial = arn:aws:iam::123456789:mfa/bootstrap

Install software tools

These instructions assume your command shell has access to the following tools: bash, the jq json tool, git, and the aws cli.

  • download the cloudformation templates and helper scripts from our git repository:

    git clone https://github.com/frickjack/misc-stuff.git
  • add the arun tool to your command path

    # assuming you're running a bash shell or similar
    alias arun="bash $(pwd)/misc-stuff/AWS/bin/arun.sh"
    export LITTLE_HOME="$(pwd)/misc-stuff/AWS"
  • run the bootstrap script - it does the following:

    • deploys a block on s3 public access
    • creates an s3 bucket for cloudformation templates
    • sets a password policy for IAM users

ex:

export AWS_PROFILE="bootstrap-ohio"
arun accountBootstrap
  • finally - prepare the inputs to our cloudformation stacks.
    • make a copy of the account-specific stack-parameters:
      cp AWS/misc-stuff/db/frickjack AWS/misc-stuff/db/YOUR-ACCOUNT
    • make whatever changes are appropriate for your account. For example - change the SNS notify e-mail in AWS/db/cloudformation/YourAccount/accountSetup/snsNotify.json
    • customize the cloudformation templates under misc-stuff/AWS/lib/cloudformation/ for your account. For example - the IamSetup.json template sets up an IAM policy that allows access to S3 and lambda and APIGateway API's, because I'm interested in those serverless technologies, but you may want to add permissions for accessing the EC2 and VPC API's.

What's the idea?

Before we start deploying stacks let's talk about the ideas we're implementing.

Authentication

The Right Way to Authenticate

First, authentication - how should a user prove who he is? AWS IAM has primitives for setting up users and groups, but that's not your best option for establishing a user's identity, because it's one more thing you need to maintain.

Instead of administering identity with users and groups in IAM under an AWS account - it's better to setup federated authentication with Google Apps or Office365 or some other identity provider that you already maintain with multi-factor auth and a password policy and all that other good stuff. If you don't already have an identity provider, then AWS has its own service, AWS SSO

While you're at it - you might setup an organization, because whatever you're doing it is bound to be wildly successful, and you'll wind up setting up multiple accounts for your galloping unicorn, and an organization helps simplify that administration.

Not the Right Way to Authenticate

If you don't already have an SSO identity provider, and you don't have someone to do it for you, then setting up an SSO and AWS federation and an AWS organization may seem like a lot of work just to manage a small team's access to AWS API's. So let's not do things the right way, but let's not be completely wrong either. We can emulate the right way.

  • require MFA for user authentication
  • enforce a password length and rotation policy
  • require rotation of user access keys
  • associate each user with at least one group
  • associate each group with an IAM role, so that a group member gains access to AWS API's by acquiring a temporary credentials via a multifactor-signed call to sts

This authentication setup ensures that access to AWS API's comes either from AWS managed temporary credentials passed directly to AWS resources like EC2 or labmda via something like the AWS metadata service, or a user must pass mutlifactor authentication to acquire a temporary token directly. Hopefully this authentication setup will protect our account from being compromised due to an exposed secret.

Authorization

Now that we have a mechanism to securely authenticate users and services that want to access AWS API's, how should we decide which privileges to grant different users? Our iamSetup cloudformation stack sets up three groups of users each associated with its own IAM role:

  • administrator
  • operator
  • developer

We restrict write permission to IAM policies to the administrators that are trained to enforce least privilege access. We want to restrict which users can disable cloudtrail, because there's no reason to do that.

The administrator group also shared permissions to create other AWS resources (whatever we want to allow in our account) with the group of operators. I'm not sure if it makes sense to have both an administrator group and an operator group - but one scenario might be that an administrator can setup IAM policies conditional on resource tags for a particular application or whatever, and an operator (maybe a devops specialist on a team) can then create and delete resources with the appropriate tags.

The developer group cannot create new resources directly, but they do have permissions to deploy new versions of an application (udpate a lambda, or change the backend on an api gateway, or upgrade an EC2 AMI, or modify S3 objects - that kind of thing).

Finally - each application service has its own IAM role attached to its ECS container or EC2 instances or lambda or whatever. The administrator, operator, and developer roles should only be available to human users; each application's role grants the minimum privilege that service requires.

Tagging

A consistent tagging strategy allows everyone to easily determine the general purpose of a resource and who is responsible for the resource's allocation and billing. Something like this works, but there are many ways to do it.

"Tagging": {
        "TagSet": [
            {
                "Key": "org",
                "Value": "devops"
            },
            {
                "Key": "project",
                "Value": "infrastructure"
            },
            {
                "Key": "stack",
                "Value": "main"
            },
            {
                "Key": "stage",
                "Value": "prod"
            },
            {
              "Key": "role",
              "Value": "cloudformation-bucket"
            }
        ]
    }

Logs, Metrics, Monitoring, and Alerts

I was slow to understand what's up with cloudwatch and sns, but it's not that complicated. SNS is a pub-sub system - a client publishes something to a topic, and subscribers (lambda functions, e-mail, SQS, queues, ...) immediately receive whatever was published - no queueing or flow control - just a way to decouple systems.

Cloudwatch logs is a place to save log ("event") streams. Cloudwatch events lets listeners subscribe for notificates of various events from the AWS control plane. Cloudwatch metrics lets applications publish metrics like load, response time, number of requests, whatever. Cloudwatch alarms fire actions (lambda, SNS publication, ...) triggered by rules applied to metrics, events, and logs.

For example - our cloudformation stack sets up a notifications topic in SNS that our cloudwatch alarms publish to; and we setup alarms to send notifications when changes are made to IAM, or when the root account is accessed, or when an account approaches its budget limit, or when AWS guard duty detects something ... that kind of thing.

Deploy the stacks

Ok - let's do this thing. As the bootstrap user:

  • setup IAM groups and roles
arun stack create "$LITTLE_HOME/lib/cloudformation/accountSetup/iamSetup.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/iamSetup.json"

Check if the stack came up successfully:

arun stack events "$LITTLE_HOME/lib/cloudformation/accountSetup/iamSetup.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/iamSetup.json"

If not, then you can delete the stack, fix whatever the problem is, and try again:

arun stack delete "$LITTLE_HOME/lib/cloudformation/accountSetup/iamSetup.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/iamSetup.json"

Similarly, you can modify a successfully deployed stack later:

arun stack update "$LITTLE_HOME/lib/cloudformation/accountSetup/iamSetup.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/iamSetup.json"
  • setup a real user for yourself

The iamSetup stack creates administrator, operator, and developer groups and roles - where members of each group can assume the corresponding role. Use the AWS web console to create a user (with MFA, etc) for yourself via the console, and add the user to the administrator group. Download an access key for the new user, and configure your local ~/.aws/config, so that you can run commands with an administrator token - something like this:

[default]
region = us-east-1
output = json
mfa_serial = arn:aws:iam::012345678901:mfa/yourUser

[profile admin-ohio]
region = us-east-2
role_arn = arn:aws:iam::012345678901:role/littleware/account/user/littleAdmin
source_profile = default
mfa_serial = arn:aws:iam::012345678901:mfa/yourUser

With these credentials in place, you can run commands like the following. These tools will prompt you for an MFA code when necessary to acquire a fresh access token:

export AWS_PROFILE=admin-ohio
aws s3 ls
arun env | grep AWS_

You can now deploy the following stacks as the new administrator user, and delete the bootstrap user.

  • setup cloudtrail

Update the cloudtrail parameters (AWS/db/cloudformation/YourAccount/accountSetup/cloudTrail.json) with a bucket name unique to your account - something like cloudtrail-management-$YourAccountName. You can retrieve the name of your account with aws iam list-account-aliases.

arun stack create "$LITTLE_HOME/lib/cloudformation/accountSetup/cloudTrail.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/cloudTrail.json"
  • setup an SNS topic

Remember to set the notification e-mail in the parameters before deploying the SNS stack; or customize the template with a subscriber for whatever notification channel (Slack, SMS, ...) you prefer. You can always add more subscribers to the topic later.

arun stack create "$LITTLE_HOME/lib/cloudformation/accountSetup/snsNotify.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/snsNotify.json"
  • setup alarms

Update the stack parameter files for the alaram stacks (AWS/db/YourAccount/accountSetup/*Alarm.json) to reference the new SNS topic (aws sns list-topics) before deploying the following stacks:

arun stack create "$LITTLE_HOME/lib/cloudformation/accountSetup/guardDuty.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/guardDuty.json"

arun stack create "$LITTLE_HOME/lib/cloudformation/accountSetup/budgetAlarm.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/budgetAlarm.json"

arun stack create "$LITTLE_HOME/lib/cloudformation/accountSetup/rootAccountAlarm.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/rootAccountAlarm.json"

arun stack create "$LITTLE_HOME/lib/cloudformation/accountSetup/iamAlarm.json" "$LITTLE_HOME/db/cloudformation/YourAccountNameHere/accountSetup/iamAlarm.json"

Summary

We presented a simple way to secure API access in a new AWS account with authentication, authorization, a tagging strategy, a notification topic in SNS, basic cloudtrail logging, guard duty monitoring, and a few alarms. This simple setup is just a first step for a small team's journey into cloud security. A more sophisticated deployment would leverage AWS organizations and SSO. A larger organization may setup configuration rules, administrative accounts for centralized logging and alerts, and the journey goes on and on (we haven't even deployed an application yet).

Monday, December 24, 2018

EKS and AWS CNI IP Space management

AWS recently introduced its EKS managed kubernetes service which manages the kubernetes control plane (API and Etcd services), while the cluster owner administers the cluster's worker nodes in a VPC.

One of the features of EKS is the VPC CNI networking plugin that tightly integrates with AWS VPC networking, so that each kubernetes pod is assigned an IP address from the VPC CIDR range allocated to the worker node subnets. It's important to remember that EKS pods draw from the VPC IP pool when designing the VPC subnets for an EKS worker pool. When we transitioned our kubernetes infrastructure to EKS we initially allocated three /24 CIDR subnets for the EKS worker nodes. We assumed that design would allow a cluster of up to 768 nodes (256 nodes per subnet) where each node runs up to 5 pods, so up to 3840 pods with the calico CNI plugin that manages a separate IP space for pods in an overlay network; but with the AWS CNI the pods and nodes both draw IP addresses from the VPC pool of 768 IP addresses, so the cluster only supports about 625 pods on 125 nodes.

JWT Parser

I just published a new JSON web token (JWT) parser app under apps.frickjack.com. There's nothing special about this app - it's just another little vanilla-js plus custom-elements app that continues the evolution of the little-elements and little-apps code.

Thursday, October 11, 2018

little-elements webapp shell

I recently updated apps.frickjack.com to use unbundled javascript modules to load custom elements and style rules. It worked great, except I soon noticed that initial page load would present an un-styled, no-custom-element rendering of the page until the javascript modules finished loading asynchronously... ugh.

I worked around the problem by implementing a simple application shell. The html pages under https://apps.frickjack.com now extend a compile-time nunjucks template, basicShell.html.njk (npm install @littleware/little-elements). The shell hides the page content with a style rule, maybe loads the webcomponentsjs polyfill, and renders a "Loading ..." message if the main.js javascript module doesn't load fast enough. I also extended the main javascript module on each page to signal the shell when it has finished rendering the initial page content. For example - indexMain.js for https://apps.frickjack.com/index.html looks like this:

import './headerSimple/headerSimple.js';

if ( window['littleShell'] ) {
    window['littleShell'].clear();
}

The index.html nunjucks template looks like this:

{% extends "little-elements/lib/styleGuide/shell/basicShell.html.njk" %}

{% block head %}
<title>frickjack.com</title>
{% endblock %}

{% block content %}
    <lw-header-simple title="Home">
    </lw-header-simple>
    ...
   <script src="{{ jsroot }}/@littleware/little-apps/lib/indexMain.js" type="module"></script>
{% endblock %}

A visitor to https://apps.frickjack.com may now briefly see a "Loading" screen for the first page load, but subsequent page loads generally load fast enough from cache that the shell doesn't bother to show the "Loading" screen. I can improve on this by extending the shell with a service worker - that will be my next project.

I envy gmail's loading screen that renders a crazy animation of a googley envelope. Fancy!

Friday, August 03, 2018

jasminejs tests with es2015 modules

Running jasminejs tests of javascript modules in a browser is straight forward, but requires small adjustments, since a module loads asynchronously, while jasmine's default runtime assumes code loads synchronously. The approach I take is the same as the one used to test requirejs AMD modules:

  • customize jasmine boot, so that it does not automatically launch test execution on page load
  • write a testMain.js script that imports all the spec's for your test suite, then launches jasmine's test execution

For example, this is the root test suite (testMain.ts) for the @littleware/little-elements module:

import './test/spec/utilSpec.js';
import './arrivalPie/spec/arrivalPieSpec.js';
import './styleGuide/spec/styleGuideSpec.js';
import {startTest} from './test/util.js';

startTest();

The startTest() function is a little wrapper that detects whether karmajs is the test runtime, since the test bootstrap process is a little different in that scenario. The karmajs config file should also annotate javascript module files with a module type. For example, here is an excerpt from little-element's karma.conf.js:

    files: [
      'lib/test/karmaAdapter.js',
      { pattern: 'lib/arrivalPie/**/*.js', type: 'module', included: false },
      { pattern: 'lib/styleGuide/**/*.js', type: 'module', included: false },
      { pattern: 'lib/test/**/*.js', type: 'module', included: false },
      { pattern: 'lib/testMain.js', type: 'module', included: true },
      { pattern: 'node_modules/lit-html/*.js', type: 'module', included: false }
    ],

Feel free to import the @littleware/little-elements module into your project if it will be helpful!

Tuesday, July 31, 2018

little-elements

Last week I rolled out an update to apps.frickjack.com that changes the plumbing of the site.

  • The javascript is now organized as unbundled es2015 modules
  • The site also leverages javascript modules to manage html templates and CSS with lit-html

I also setup my first module on npm (here) to track UX components that could be shared between projects - which was fun :-)

wait for kubernetes pods

First, there's probably a better way to do this with kubectl rollout, but I didn't know that existed till last week.

Anyway ... kubernetes' API is asynchronous, so if an operator that issues a series of kubernetes commands to update various deployments wants to wait till all the pods in those deployments are up and running, then he may take advantage of a script like: kube-wait4-pods

(
    # If new pods are still rolling/starting up, then wait for that to finish
    COUNT=0
    OK_COUNT=0
    # Don't exit till we get 2 consecutive readings with all pods running.
    while [[ "$OK_COUNT" -lt 2 ]]; do
      g3kubectl get pods
      if [[ 0 == "$(g3kubectl get pods -o json |  jq -r '[.items[] | { name: .metadata.generateName, phase: .status.phase, waitingContainers: [ try .status.containerStatuses[] | { waiting:.state|has("waiting"), ready:.ready}|(.waiting==true or .ready==false)|select(.) ]|length }] | map(select(.phase=="Pending" or .phase=="Running" and .waitingContainers > 0)) | length')" ]]; then
        let OK_COUNT+=1
      else
        OK_COUNT=0
      fi
      
      if [[ "$OK_COUNT" -lt 2 ]]; then
        echo ------------
        echo "INFO: Waiting for pods to exit Pending state"
        let COUNT+=1
        if [[ COUNT -gt 30 ]]; then
          echo -e "$(red_color "ERROR:") pods still not ready after 300 seconds"
          exit 1
        fi
        sleep 10
      fi
    done
)

For example - this Jenkins pipeline deploys a new stack of services to a QA environment, then waits till the new versions of pods are deployed before running through an integration test suite.

...
    stage('K8sDeploy') {
      steps {
        withEnv(['GEN3_NOPROXY=true', "vpc_name=$env.KUBECTL_NAMESPACE", "GEN3_HOME=$env.WORKSPACE/cloud-automation"]) {
          echo "GEN3_HOME is $env.GEN3_HOME"
          echo "GIT_BRANCH is $env.GIT_BRANCH"
          echo "GIT_COMMIT is $env.GIT_COMMIT"
          echo "KUBECTL_NAMESPACE is $env.KUBECTL_NAMESPACE"
          echo "WORKSPACE is $env.WORKSPACE"
          sh "bash cloud-automation/gen3/bin/kube-roll-all.sh"
          sh "bash cloud-automation/gen3/bin/kube-wait4-pods.sh || true"
        }
      }
    }
...

BTW - the code referenced above is a product of the great team of developers at the Center for Data Intensive Science.

Saturday, April 07, 2018

nginx reverse-proxy tricks for jwt and csrf

I've been doing devops work over the last few months at CDIS. One of the tasks I worked on recently was to extend the nginx configuration (on github as a kubernetes configmap here) for the gen3 data commons stack.

First, we added end-user data to the proxy log by extracting user information from the JWT token in the auth cookie. We took advantage of the nginscript (nginscript is a subset of javascript) to import some standard javascript code (also in the kubernetes configmap) to parse the JWT.

function userid(req, res) {
  var token = req.variables["access_token"];
  var user = "uid:null,unknown@unknown";

  if (token) {
    // note - raw token is secret, so do not expose in userid
    var raw = atob((token.split('.')[1] || "").replace('-', '+').replace('_', '/'));
    if (raw) {
      try {
        var data = JSON.parse(raw);
        if (data) {
          if (data.context && data.context.user && data.context.user.name) {
            user = "uid:" + data.sub + "," + data.context.user.name;
          }
        }
      } catch (err) {}
    }
  }
  return user;
}

Next, we added a CSRF guard that verifies that if a CSRF cookie is present, then it matches a CSRF header. The logic is a little clunky (in github here), because of the restrictions on conditionals in nginx configuration, but it basically looks like the sample configuration below - where $access_token is the JWT auth token (either from a header from non-web clients, or a cookie for web browser clients), and an "ok-SOMETHING" $csrf_check is only required for browser clients. Finally - we do not enforce the HTTP-header based CSRF check in the proxy for endpoints that may be accessed by traditional HTML form submissions (that embed the token in the form body), or for endpoints that are accessed by third party browser frontends (like jupyterhub) that may implement a CSRF guard in a different way. Finally, there's also a little bit of logic that auto-generates a CSRF cookie if it's not present - a javascript client in the same domain can read the cookie to get the value to put in the X-CSRF header.

          set $access_token "";
          set $csrf_check "ok-tokenauth";
          if ($cookie_access_token) {
              set $access_token "bearer $cookie_access_token";
              # cookie auth requires csrf check
              set $csrf_check "fail";
          }
          if ($http_authorization) {
              # Authorization header is present - prefer that token over cookie token
              set $access_token "$http_authorization";
          }

          #
          # CSRF check
          # This block requires a csrftoken for all POST requests.
          #
          if ($cookie_csrftoken = $http_x_csrf_token) {
            # this will fail further below if cookie_csrftoken is empty
            set $csrf_check "ok-$cookie_csrftoken";
          }
          if ($request_method != "POST") {
            set $csrf_check "ok-$request_method";
          }
          if ($cookie_access_token = "") {
            # do this again here b/c empty cookie_csrftoken == empty http_x_csrf_token - ugh  
            set $csrf_check "ok-tokenauth";
          }
          ...
          location /index/ {
              if ($csrf_check !~ ^ok-\S.+$) {
                return 403 "failed csrf check";
              }
              if ($cookie_csrftoken = "") {
                add_header Set-Cookie "csrftoken=$request_id$request_length$request_time$time_iso8601;Path=/";
              }

              proxy_pass http://indexd-service/;
          }
          ...

Thursday, November 23, 2017

Jenkins Backup Pipeline

I recently setup a Jenkins CICD service to complement the Travis based automation already in use where I work. I worked with job-based Jenkins workflows in the past - where we setup chains of interdependent jobs (ex: build, publish assets, deploy to dev environment, ...), but I took this opportunity to adopt Jenkins' new (to me) Pipeline pattern, and I'm glad I did.

A Jenkins pipeline defines (via a groovy DSL) a sequence of steps that execute together in a build under a single Jenkins job. Here's an example pipeline we use to backup our Jenkins configuration to S3 every night.

#!groovy

pipeline {
  agent any

  stages {
    stage('BuildArchive'){
      steps {
        echo "BuildArchive $env.JENKINS_HOME"
        sh "tar cvJf backup.tar.xz --exclude '$env.JENKINS_HOME/jobs/[^/]*/builds/*' --exclude '$env.JENKINS_HOME/jobs/[^/]*/last*' --exclude '$env.JENKINS_HOME/workspace' --exclude '$env.JENKINS_HOME/war' --exclude '$env.JENKINS_HOME/jobs/[^/]*/workspace/'  $env.JENKINS_HOME"
      }
    }
    stage('UploadToS3') {
      steps {
        echo 'Upload to S3!'
        sh 'aws s3 cp --sse AES256 backup.tar.xz s3://cdis-terraform-state/JenkinsBackup/backup.$(date +%u).tar.xz'
      }
    }
    stage('Cleanup') {
      steps {
        echo 'Cleanup!'
        sh 'rm -f backup.tar.xz'
      }
    }
  }
  post {
    success {
      slackSend color: 'good', message: 'Jenkins backup pipeline succeeded'
    }
    failure {
      slackSend color: 'bad', message: 'Jenkins backup pipeline failed'
    }
    unstable {
      slackSend color: 'bad', message: 'Jenkins backup pipeline unstable'
    }
  }
}

If you are a Jenkins user, then take the time to give Pipelines a try. If you're also using github or bitbucket - then look into Jenkins' support for organizations that nicely support pull-request based workflows. Also try the new Blue Ocean UI - it's designed with pipelines in mind.