logo

Engineering

logo

BLOG

arrow

DEVOPS

Shipping Through the Storm: The YOUGotaGift Infra Story

Blog / DEVOPS

MARCH 31, 2025

Shipping Through the Storm: The YOUGotaGift Infra Story

Shameem Khalid

Director of Cloud Engineering

icon

Shameem Khalid

Director of Cloud Engineering

logo

Ashin KN

Chief Technology Officer

icon

Ashin KN

Chief Technology Officer

logo

MARCH 31, 2025

The YOUGotaGift Infrastructure Journey


From one server to multi-cloud: A DevOps rollercoaster ride. What does it take to build an infrastructure that serves millions of customers across the globe while navigating regional regulations, shifting cloud landscapes, and a global pandemic?

At YOUGotaGift, our infrastructure journey has been anything but ordinary. Over the past decade, we've evolved from a single-server setup to a sophisticated multi-cloud architecture , each transformation pushing us closer to our goal of building the most reliable and responsive private network for branded payment.

Buckle up as we take you behind the curtain of our technical evolution!

The Scrappy Startup Days

In the early days of 2013, A time when cloud computing was still maturing and startup budgets were razor-thin. Our first server setup was so bare-bones, it would make today's DevOps engineers cringe. One server. One database. Zero high availability.

When something crashed at 2 AM, whoever from our tight-knit crew team saw the alert first would jump online to fix it, still shaking off sleep but laser-focused on getting us back online.

Like many startups, we weren't building for scale, we were building to survive. Our entire focus was on solving the immediate business problem and proving our concept worked.

Our early prototype found its first home on Rackspace. It wasn't glamorous, but it worked... until one day, it dramatically didn't. And that's when our real infrastructure journey began.

The AWS Migration: Our First Big Bet


The growing pains hit faster than we expected. As customer adoption increased, we faced a critical decision, scale up our existing infrastructure or reimagine it entirely?

We chose the latter, making our first strategic infrastructure bet: migrating to AWS.

AWS offered us something we desperately needed , reliability at scale and solid managed services that we lacked in the previous setup.

We embraced Elastic Beanstalk as our compute platform, a quick platform as a service option, along with ELK for logging, added Sentry for error tracking, and relied on AWS RDS for our database needs.

The setup handled the workload well until we faced our first major scaling challenge: unpredictable traffic surges from an enterprise client. An enterprise client would place a large-volume eGift card order, with delivery scheduled via email or SMS. Once recipients received their messages and clicked on the eGift card links, they would surge onto our platform all at once, creating a sudden spike in traffic. These massive traffic spikes were difficult to predict , we had no visibility into exactly when the client would place an order or how quickly recipients would click through.

What we needed was to design an approach based on workload patterns.

For example, for a particular product, most of the hits were happening on a unique page, and customers take some time to navigate to other pages. Interesting, right?  One of the best approaches is to serve this page via a CDN provider that has a larger surface to handle requests and channel to backend servers. Is it scalable, of course by definition it is? 

Unfortunately, due to application constraints, we could not do this. What would be the next best approach?

We managed to figure out some random guesses on the load and scaled up additional nodes to handle initial spikes, till the adaptive scaling finishes its job. Of course the customer experienced a slight delayed response, but still it would work.

Finally the day came and the Elastic Beanstalk started to scale beyond the predicted load with 20% request failures. Why did request failures happen ?

Yes, the adaptive scaling time was considerably high. The culprit was our Django application. its slow initialization caused by the real-time, one-by-one installation of dependency packages. Each new Elastic Beanstalk instance has to undergo this process before handling the increased requests.

The next day became an emergency hack day: we created prebuilt AWS EC2 Amazon Machine Image with our entire Django environment pre-installed and configured. The images included all Python packages, and optimized web server configs. When Elastic Beanstalk needed to scale, it would simply clone these ready-to-go images, reducing startup time to under 1 minutes.

Did it work? Yes, it did , our error rates dropped from 20% to under 0.2%. But in the fast-moving world of cloud infrastructure, quick fixes rarely last. We needed something more scalable, more resilient and that's when we faced our next challenge.

The Containerization Crisis


Unpredictable traffic spikes were becoming our new normal. Every marketing campaign or major corporate client bulk order would push our Beanstalk setup to its limits. The scaling was painfully slow which left our customers staring at loading screens for far too long. 

Maintaining the image hack that we implemented in Elastic Beanstalk was a complex process. 

We faced a critical crossroad - Either continue fine-tuning Beanstalk (the safer bet) or embrace the emerging container revolution (the riskier, but potentially more rewarding path).

What would you have done?

Both moves would have addressed the issue, but we took the leap into containerization by adopting AWS ECS. This transition required a substantial investment from our team, who spent countless hours on writing Terraform code to orchestrate our new container ecosystem. The effort paid off dramatically, scaling operations that previously took more than 6 minutes in our Elastic Beanstalk environment now completed in under 60 seconds with our containerized approach, which pre-build all the required packages upfront in a very thin image.

We started building solid containers , build once, deploy anywhere, anytime, at any scale. That freedom gave us more confidence in addressing package missing issues, saved application deployment time, scaling time and enabled better rollback capabilities.

A key enabler in this transition was our adoption of Terraform for infrastructure automation. Shifting from manual provisioning to Infrastructure as Code (IaC) allowed us to version, test, and replicate our entire stack with consistency. What used to take days of careful configuration now happened in minutes with a single command.

The Kubernetes Gamble


During a rare period of relative calm, we took a hard look at our infrastructure. While ECS had served us well, we were feeling the limitations:

  • Vendor lock-in was becoming a concern.
  • Helm charts - a tool we desperately wanted to use weren't supported.
  • Deployment workflows could be significantly streamlined.

We made another calculated bet: migrating select workloads to Amazon EKS (Elastic Kubernetes Service).

This decision seemed like a small technical shift at the time, but it would later prove to be one of our smartest moves. This strategic decision paid off handsomely two years later when we had to migrate one of our products to Oracle Cloud, and later to Google Cloud. The transition was not only faster and smoother than we anticipated, but it was nearly painless, thanks to the groundwork we had laid.

The Perfect Storm: COVID-19 and Saudi Arabia Expansion


When COVID-19 hit, our traffic patterns didn't just change they transformed. Globally, digital acceptance increased. Our platform wasn't just convenient anymore; it was essential and safe to use from the convenience of customer's homes. Like many others, we shifted, relocated, but survived , rather, we leveled up. How? Through the unexpected AWS Credit. This credit gave us the freedom to explore and build a solid backbone for our infrastructure today. Meanwhile, our growth in Saudi Arabia reached a tipping point where customers began demanding local data residency. But there was one massive problem: AWS didn't have a Saudi region.

After exhausting every imaginable option from piecing together solutions with local telecom providers to contemplating the eye-watering expense of AWS Outposts, we found ourselves staring at what felt like a technological step backward: Oracle Cloud, the only hyperscaler with a Saudi footprint at the time.

Our Kubernetes investment became our lifeline. In theory, our containerized architecture could run anywhere. In practice? Oracle Cloud tested not just this theory but our team's sanity. We migrated the product used by our Saudi customers to the Oracle Cloud region in Saudi Arabia.

The next two years were a technological horror story for those products. Load balancers failed during critical business hours. Machine provisioning could take a couple of minutes instead of under one minute. Simple tasks that took seconds in AWS now required custom-built workarounds and babysitting.

By the time Google Cloud Platform finally opened their Saudi region in 2024, our team was more than ready. The migration was completed in record time, our early commitment to Kubernetes and deliberate avoidance of vendor lock-in meant we could shift our entire infrastructure to Google Cloud with virtually no code changes. What might have been months of painful refactoring became weeks of smooth transition, vindicating our platform architecture decisions made years earlier.

Continuous Innovation: Thriving Against the Odds


Despite the massive infrastructure challenges we faced, we never stopped innovating on the product front. It's true, while our engineers were battling and moving the ship, our product teams were reimagining the future of YOUGotaGift. We launched a next-generation e-commerce platform, established an entirely new business unit (YOUProcess), and orchestrated complex workload migrations to the UAE region all while keeping our existing services running smoothly.

Among our most impactful technical achievements was the adoption of serverless databases coupled with Global Application Load Balancers - a strategic move that reduced our overall response times by 30%. This wasn't just an internal win; AWS itself took notice, featuring our transformation in a case study that highlighted how businesses can evolve their architecture while scaling operations.

Looking back, it's clear that our success wasn't built on grand, sweeping initiatives but rather a series of small, achievable decisions made consistently over the years. These incremental improvements from adopting Kubernetes early to avoiding vendor lock-in, from embracing containerization to prioritizing resilient architecture have collectively shaped us into the most robust, reliable, and adaptable technology company in our industry. 

The infrastructure journey we've shared today is just the beginning a robust foundation upon which we continue to build. We're excited to bring you our next chapter, where we'll explore the sophisticated tools and proven methodologies driving YOUGotaGift's digital ecosystem, including our strategic AI integration that's transforming tech excellence. Our technological evolution keeps accelerating, and the most exciting innovations are still ahead.

Stay Connected

Subscribe to receive new blog posts in your RSS reader.

Join Our Team

Tell us what you do best and we’ll make magic happen.