Packer/Ansible: Unable to acquire dpkg lock

Today I ran into a rather odd issue attempting to patch a base image using Ansible and Packer. Randomly and sporadically, my playbook would fail with the error of:

Could not get lock /var/lib/dpkg/lock-frontend

If you ever try to run Ansible on Ubuntu 16.10 and later, be aware that Unattended Upgrades is enabled by default. On boot of new Packer bake instances, I noticed that it would sometimes lock apt to do security updates by default. I spent a good part of my day tracking this down as to why sometimes it would work and other times it would not; turns out it was a race condition (could apt finish faster then next step). Seeing as it’s 2018 and we should not be afraid of security fixes, I didn’t want to disable this because this is useful for security and such.

To get around this, I created a role called prerun, which does the following task:

# Check for unattended-upgrades
- name: Wait for automatic system updates to complete
  shell: while pgrep unattended; do sleep 10; done;

After including this in roles that used apt, my error went away. One of my builds took almost 30 seconds to get past this; which would have otherwise failed. Hope this helps another poor soul out there. 🙂



Another way as I was shown with Packer (to avoid adding additional Ansible roles) is to run a pre and post script in between your Ansible run! 

In your base Packer JSON provisioners section, you would do:

  "script": "scripts/",
  "type": "shell"
  "extra_arguments": [
  "playbook_dir": "playbooks",
  "playbook_file": "playbooks/example.yml",
  "type": "ansible-local"
  "script": "scripts/",
  "type": "shell"

In, you would need a section to do whatever you need (most likely install ansible and such):

# Ensure to disable u-u to prevent breaking later
sudo systemctl mask unattended-upgrades.service
sudo systemctl stop unattended-upgrades.service
# Ensure process is in fact off: echo "Ensuring unattended-upgrades are in fact disabled" while systemctl is-active --quiet unattended-upgrades.service; do sleep 1; done

Finally, in (and whatever else you need, like delete ansible dir and such):

sudo systemctl unmask unattended-upgrades.service
sudo systemctl start unattended-upgrades.service


Another solution could be to add this in you Ansible roles before package installation:

- name: Wait for /var/lib/dpkg/lock-frontend to be released   
shell: while lsof /var/lib/dpkg/lock-frontend ; do sleep 10; done;

Thanks Gordon Kirkland for the additional solution!

Hope one of these many solutions helps others stuck with the same problem!

AWS Summit: GameDay Experience

Today, I participated in a local AWS Summit’s free GameDay event here in Chicago. Going into this, I was not sure what I was getting myself into. The details of the event sounded intriguing; split up and compete to see who can create the best scalable solution.

The event starts by splitting you up into groups of 3-4 individuals. It pairs beginners with advanced AWS users to not only help balance out the room but also teach at the same time. I thought that was a really cool and neat concept; learn from one another. Walking into the room, I voiced a confident “advanced” but leaving it left me hungry for more and full of questions. It was well worth my time and it was loads of fun.

First, the CEO of Unicorn Rentals walks in and gives the most “inspiring” talk that basically can be summed up to: “I told folks on Good Morning America that we are live, good luck, don’t mess up, and make me a lot of money.” Fortunately for us, we were given a comical half-assed “Runbook” of the architecture and how it works; sadly this is probably more documentation than some projects I have worked on in my professional career! From coffee stains to passwords scribbled all over the place, I knew from the get go that this was going to be a fun little ride.

The "Runbook"

First thing we did from logging in was delete all those scratched out accounts, and change the password on the root account! Once we secured the account, we needed to register our team on the scoreboard. This was done by creating a root TXT record in the hosted zone with our team name. It appears the organizers listened for the TXT records and registered teams off that; kinda cool! From here, the fun begins and our application begins taking load.

When we were handed the account, we were told that it is working today, but we need to make it better to handle increased load in ~30 minute increments. First thing we checked was to see how our application was configured. We noticed that the root A record was pointing to an individual EC2 instance with a static IP and not to a ELB/ASG. We corrected this issue by creating an ALIAS record to the ELB for the root record and associated the ELB with the ASG. Next, we were told to create a simple deployment architecture using only User Data and Auto Scaling Groups. We started by copying the existing broken launch configuration, fixed it so it got the new code base and also didn’t have a shutdown command at the end (WHY!!!), and then we proceeded to launch with a scale of 2 instances (hard-coded for now). Keep in mind this was only 30 minutes into the competition.

From here, we faced various issues such as:

  • The “Network Engineer” messed up our ACLs with a bad script
  • The same “Network Engineer” killed our main route table with no default Internet Gateway.
  • You were met with other nefarious and “accidental” issues along the way and it was great to simulate the Oops moments.

Aside from these moments, we began to notice that our subnet for launching instances was limited to only a /28 CIDR block! Load by this point was starting to jump to almost double digits; This just wouldn’t do! We fixed this by creating three subnets (one in each AZ) and associated them with the ELB, and autoscaling groups. With ample space, we can now focus on improving our scaling policy. At first we scaled based on network requests, but later on we determined we should have used the ELB Alarms on latency and scaled off of this. With a solid policy, enough servers to handle traffic, we figured we are good to go! Wrong, so wrong, I can’t believe how wrong we were.

Despite having everything in order, our servers still couldn’t keep up with load! What was going on? There was enough servers, they weren’t crashing, no high load. What could be the problem? We were stuck on this for sometime until we logged into the instance and noticed that the application handled one request with an average latency of about 4 seconds. Playing with the binary, we noticed there was an Elasticache (memcached) option we could leverage. We sprang to the occasion and built up a small distributed memcache cluster and configured our app to use it. Now our average request time was ~.2 seconds! Sweet!

However, despite having fast response times, we were still getting failures. I didn’t make any sense, until we ran the binary and watched it. It appeared that it would only handle one connection at a time, and then reject any other connections as it was handling that one request. This ended up taking the most time, but we couldn’t implement a solution in time. Speaking with some of the Solutions Architects, they mentioned some teams used Docker to run multiple on the same host, and some folks found out that you can run about ~4 binaries concurrently on different ports and handle the load. While the competition winded down, I was implementing a solution that involved running 4 instances of the binary on the same host leveraging IpTables to do round robin; unfortunately, I didn’t get a chance to see it in practice but I was confident this solution would have worked.

Change Management...HAH

All in all, it was a great experience. No my team did not win, but I got to explain and teach others about AWS and some best practices. Towards the end it was really cool to see a team effort as we started tackling harder and harder challenges, and overall I think we all walked away with a little bit more knowledge than we came in. At one point, I had 10 people sitting around me listening to me explain how we do deployments at gogo, our challenges and pitfalls with AWS, and other random topics. Even some of the Solutions Architects came by to sit and discuss, and joked “I needed stadium seating” with the amount of people around me! At that moment it made me realize we do some pretty wicked stuff at work. If anything, this re-energized me to get back and implement some of the things I learned at work!

With all this being said, I really got to thank the AWS Summit GameDay organizers! This is great, and loads of fun. Please keep doing these kinds of events; especially in other cities!