During the summer of 2018, I worked as a Software Engineering Intern on Amazon Web Services (AWS) Kinesis. AWS Kinesis is a fully managed service that allows customers to collect, process, and analyze real-time, streaming data to get timely insights and react quickly to new information. I helped develop internal scripting tools, update real-time metric dashboards, and contribute to internal documents, but my work primarily focused on developing features for measuring customer usage. My work helped support the new enhanced fan-out feature which greatly increases stream bandwidth. By the end of my internship, my code has been deployed internationally, affecting millions of users currently using Kinesis.

After I graduated with my Master’s Degree from Georgia Tech, I immediately began working full time at AWS Kinesis. During my time there I learned a variety of skills and developed both internal and external customer-impacting features. I outline some of my thoughts and accomplishments below.

One of the skills I learned during my time at Kinesis is how to scale infrastructure. When dealing with an ever-growing amount of data and users, scaling out infrastructure to keep up with the velocity of code changes and new features quickly becomes extremely important. In my team, infrastructure was needed to safely deploy to 25+ regions with at least 3 Availability Zones (AZs) each, improve our continuous integration and continuous deployment process for faster releases, create auto-rollback conditions that were directly hooked into our monitoring infrastructure, and having a set of robust canary tests in production that replicated customer behavior. Any new feature or code change had to adhere to our infrastructure requirements in order to mitigate impact and blast radius during deployments.

There were also several tools I have come to appreciate as well. Specifically, the vast amount of metrics and dashboards, access to a large number of old and new design documents, and a large wiki allowing quick access to information such as Standard Operating Proceduress (SOP) and collections of dashboards. At Amazon, metrics and dashboards are everywhere. Every application is expected to have several monitors to detect all sorts of failures and changes in latency and some are captured automatically as part of the monitoring infrastructure. All of this leads to extremely detailed dashboards and the ability to capture impactful events before they’re able to impact customers. This is on top of having design documents and wiki information readily available. Whether it’s old design patterns or new technologies, having all that information available is quite empowering.

To wrap it up, I list below some of the more significant projects I have been a part of. With each project came a wealth of experience and I’m proud of what I’ve been able to accomplish during my time in Kinesis.

  • Lead and developed scaling component of On-Demand, a new capacity mode for Kinesis Data Streams that accommodates customer workloads as they ramp up or down
  • Developed and load tested method of priming thousands of proxy host caches with customer metadata in a scalable and deployment-safe way
  • Developed and refactored control plane code to increase throughput and reduce complexity by leveraging new DDB features when updating customer resources
  • Developed and refactored internal component to allow default scaling of UpdateShardCount API from 500 shards to 10,000 shards