Retailers and online sites in the US take the holiday shopping season seriously. There is strong demand for promotions and marketing usually is centered around bringing in as many shoppers to stores and eCommerce sites as possible. In some cases, marketers have invented days such as Amazon's Prime Day. However, the total traffic volume in the US pales when compared with larger Asian markets. Global eCommerce giant Alibaba invented a similar promotional day in China for November 11th as Singles Day. In 2018, Alibaba reported its Singles Day sales surpassed US$30 Billion. To put that number in context, Amazon's Prime Day in 2018 was expected to account for US$4 Billion in sales while the biggest eCommerce shopping day in the US so far has been Cyber Monday 2017 with sales of US$6.6 Billion. Even the largest US site wouldn't be able to handle the traffic and sales volumes of Alibaba.
In 2018, major US eCommerce sites, including Amazon, and Wal-Mart, experienced outages of several hours. For Amazon, the major outage was during the much hyped, much publicized, and self-created mid-summer shopping event of Prime Day. For Wal-Mart, the major outage was on Thanksgiving Day, just as it was kicking off many of its Black Friday promotions early.
That these outages came during their busiest times of promotional period indicates that there is still a challenge they need to overcome. All of the marketing and hype-building are wasted time and money if customers aren't able to take advantage of those promotions and offers.
These frequent and recurring outages indicate a lack of insight into expected traffic volumes, burst capacity planning and pre-event testing to those expected numbers. Most eCommerce sites utilize either their production environment during overnight hours or a separate scaled-down performance testing environment. Typically, such scaled-down environments are 50% or 33% of production sizing. Testing on such scaled-down environments will require similar downward adjustments to traffic volumes. Rather than test the resiliency of the production environment and multiples of expected traffic and transaction rates, these sites end up testing a fraction of the volume and certifying the production environment using ratios.
With cloud computing rapidly becoming a viable alternative to on-premises hosting, there is value in moving the performance testing environment to the cloud. This can also function as a disaster recovery environment if necessary in a pinch. Planning and proper execution will be necessary to deliver such an environment. With the elastic nature of cloud computing, and billing based on usage, such cloud-based environments can cost a small fraction of the on-premises environment of similar size and scale. Code optimization will be required in order take advantage of cloud features. While some of the optimization may arise from code changes to adapt to cloud, most optimizations are within the realm of the operations team that typically manage the environments.
More on the cloud computing advantages in future posts...