Transparent Huge Pages, or why DBAs are great

Here at Motus, we have a pretty cyclical traffic pattern.  Most of the month our load is pretty even, and then for 4 days a month, our traffic goes up 10x.  And when 3 of those 4 days are holidays, that 4th day is even busier.  We noticed this month that our database server was having to work a little harder than usual, and so we decided to do a little query tuning exercise.

Continue reading “Transparent Huge Pages, or why DBAs are great”

Where’d my space go?

It’s never good when “FATAL:Running out of space (100%)” pops up on your production alerts Slack channel, but there it was.  A quick check revealed that the root file system was, in fact full:

[root@gluster-01-ao ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/rootvg-rootlv
                      27G   27G     0 100% /

No room at the inn.

Usually in this case, a quick (or not-so-quick) ‘du -shx’ reveals some giant log file that got missed by logrotate or tons of Docker images.  However, today, I got nothing:

[root@gluster-01-ao ~]# du -shx /
2.0G /

Huh?  That seems odd.  Where’s my missing 25G of data?  The next usual thing is that some process is hanging on to a file that’s been deleted.  In this case, you can use ‘losf | grep deleted’ to see if a process is hanging on to a zombie file.  Nada.  We even rebooted the server, but it came back and was still full.

After some more in-depth trawling of Stack Overflow, we finally came to the answer: files hidden under mount points.  We had several GlusterFS file shares mounted on this server that we used to do rsyncs between environments.  We unmounted them one by one, and as soon as I unmounted one of them, voila:

[root@gluster-01-ao ~]# du -shx /
27 G /

As far as we can tell, one of the Gluster mounts failed one day but the rsync script continued to dump files in that mount directory.  And when the Gluster mount came back, it essentially hid the files that were there.

Now I know where I can hide data on my Linux servers if I need to…

Standardizing Java builds with Docker

Overview

One of the constant challenges with team development is ensuring that all the developers are building, testing, and deploying code in a reasonably standard way. Having a standard means that developers can shift teams easily, new developers can come up to speed more quickly, and problems are easier to diagnose.

Over the years, many, many technologies have been used to try to solve this problem. At Motus, we’ve decided to use a custom-built Docker container as our standard build tool, and we’ve been pleasantly surprised at the benefits that come from that. This blog post will cover our setup as well as some of the benefits that come from it.

Continue reading “Standardizing Java builds with Docker”

Port? What port?

Recently I was playing around with building an express.js app and running it in Docker. I used the standard npm init to create my app, wrote my code, tested it, and was happy.

Then I tried to deploy it in Marathon. No matter what I did, I couldn’t get it to load. It was as if the port forwarding just didn’t work. I even docker-enter‘ed into the container, and couldn’t connect to Node – but it was running.

As a last resort, I went poking around the auto-generated code that express creates, and I saw this line:

/**
 * Get port from environment and store in Express.
 */

var port = normalizePort(process.env.PORT || '3000');
app.set('port', port);

and the light bulb finally went off.

Marathon sets the PORT environment variable when launching a Docker container so that the app knows which port it’s proxying to (helpful for service registration, etc). In this case, Express was using that environment variable and listening on that port instead of the expected 3000. I just removed that check and we’re good to go!

Using Ansible for Marathon/Chronos deployments

Here at Motus, Ansible has long been our tool of choice for application deployments and bulk ad-hoc sysadmin commands (we use Puppet for configuration management). As we have moved from “traditional” application deployments – where we ship binaries to multiple servers, perform local deployment commands, and verify – to deployments on Marathon and Chronos – where we deploy using HTTP commands – we have had to tweak our Ansible usage somewhat. This post covers some of the things that we’ve done to allow Ansible to work well in this environment.

Continue reading “Using Ansible for Marathon/Chronos deployments”

nginx, proxy_pass, and AWS Elastic Load Balancing

Nginx is an amazing web server – after years of working with Apache, nginx just feels more fluid and natural. It’s also fast as hell. But some of the things that make it fast can cause hidden issues if you’re not paying attention.

We have a setup that involves a Docker container running nginx that uses the proxy_pass directive to forward requests to several of our running services:

    location /webservice {
        proxy_pass http://webservice.mydomain.com/webservice;
    }  

Works like a charm. But when we recently moved our staging environment to AWS, we noticed that almost every night, our web server would just stop responding to those proxy requests. Relaunching the web server container would fix the issue, so it wasn’t a code thing.

Doing my random googling, I eventually ran into this article that described what we were seeing.

It turns out that AWS changes the IP addresses of its load balancers on a pretty regular basis, but nginx doesn’t (by default) honor the DNS TTL of the upstream server. In order to make it do so, you have to do this:

    location /webservice {
        set $webservice_upstream "webservice.mydomain.com";
        resolver 8.8.8.8 valid=30s;
        proxy_pass http://$webservice_upstream/webservice;
    }  

By setting the proxy pass URL to have a variable in it, nginx decides that it needs to re-lookup the DNS entry. It uses the resolver command to get DNS servers and (optionally) a TTL override. After we deployed this, things worked like a charm!

SSL 2.0? No Problem

Let’s say, hypothetically, that you have a customer that needs to use an old browser with SSL 2.0 enabled for some of their legacy applications.  Let’s also say that one of your 3rd party partners that provide a key part of your application doesn’t support SSL 2.0 (and, let’s be honest, why should they?).  You’re stuck in a difficult spot.  Here’s how we got out of it:

Continue reading “SSL 2.0? No Problem”