How Do You Stream IPL to 25 Million People at the Same Time Without Breaking

Most engineers will never face a traffic problem at this scale. But understanding how Hotstar solved it teaches you more about distributed systems, CDN architecture, and failure engineering than any textbook will.

In 2019, Hotstar set a world record — 25.3 million concurrent viewers streaming a single IPL match. Not total viewers. Not daily active users. Concurrent. At the same moment. Watching the same event. On a platform that had to stay alive under a load spike that would kill most production systems in minutes.

This is not a story about throwing money at servers. It is a story about architecture decisions, pre-computed content, geographic distribution, and the engineering discipline of knowing exactly how your system will fail before it does.


🎯 Quick Answer (30-Second Read)

  • The scale: 25.3 million concurrent viewers, 100+ million total match viewers, peak 15 million requests per minute
  • The core strategy: Treat live streaming as a content delivery problem, not a compute problem
  • What saved them: Multi-CDN architecture, adaptive bitrate streaming, pre-warming infrastructure hours before match start
  • What most people miss: The hardest part was not the stream — it was the comment section, scorecard APIs, and ancillary services that nearly brought everything down
  • The honest failure: Hotstar has had outages during IPL — the 2019 record came after years of incremental engineering improvements from previous failures
  • Lesson for engineers: Design for the specific failure mode of your traffic pattern — a concurrent spike is a completely different problem from sustained high traffic

The Scale of the Problem in Numbers

Before getting into the architecture, the numbers need context. 25 million concurrent viewers is not a web traffic problem. It is a physics problem.

At 25 million concurrent viewers watching at 1080p (approximately 8 Mbps per stream), the total bandwidth requirement is around 200 terabits per second. For reference, the entire internet backbone capacity of India in 2019 was estimated at around 20–30 terabits per second for general traffic.

Hotstar did not serve 200 terabits per second from data centres. That is physically impossible from any centralised architecture. The only way to serve this traffic is to move the content as close to the viewer as possible — geographically, topologically, and in terms of caching hierarchy.

This is the central insight of Hotstar's architecture: live streaming at this scale is a content distribution problem, not a compute problem.

flowchart TD A([📱 25M Concurrent Viewers\nAcross India]) --> B[Multi-CDN Layer\nAkamai + Fastly + CloudFront] B --> C{Cache Hit?} C -->|✅ Yes — 95% of requests| D[Serve from CDN Edge\nNearest PoP to viewer] C -->|❌ No — 5% cache miss| E[Origin Shield\nRegional Cache Layer] E --> F{Shield Hit?} F -->|✅ Yes| D F -->|❌ No| G[Origin Servers\nVideo Packaging] G --> H[Live Encoder\nAdaptive Bitrate\nHLS Segments] H --> I[Object Storage\nS3 / GCS] I --> E D --> J([🎥 Video Delivered\nto Viewer]) K[Ancillary Services] --> L[Scorecard API] K --> M[Comments System] K --> N[Authentication] K --> O[Ad Serving] L --> P{Rate Limited?} M --> P N --> P O --> P P -->|Yes| Q([✅ System Protected]) P -->|No| R([❌ Cascade Risk]) style A fill:#0f172a,color:#ffffff,stroke:#334155 style J fill:#166534,color:#ffffff,stroke:#16a34a style Q fill:#166534,color:#ffffff,stroke:#16a34a style R fill:#7f1d1d,color:#ffffff,stroke:#ef4444 style C fill:#78350f,color:#ffffff,stroke:#f59e0b style F fill:#78350f,color:#ffffff,stroke:#f59e0b style P fill:#78350f,color:#ffffff,stroke:#f59e0b style B fill:#312e81,color:#ffffff,stroke:#6366f1 style D fill:#1e3a5f,color:#ffffff,stroke:#3b82f6 style E fill:#1e293b,color:#ffffff,stroke:#475569 style G fill:#1e293b,color:#ffffff,stroke:#475569 style H fill:#1e293b,color:#ffffff,stroke:#475569 style I fill:#1e293b,color:#ffffff,stroke:#475569 style K fill:#7c2d12,color:#ffffff,stroke:#f97316 style L fill:#1e293b,color:#ffffff,stroke:#475569 style M fill:#1e293b,color:#ffffff,stroke:#475569 style N fill:#1e293b,color:#ffffff,stroke:#475569 style O fill:#1e293b,color:#ffffff,stroke:#475569

The CDN Architecture That Made It Possible

A Content Delivery Network works by caching content at edge nodes distributed geographically close to viewers. Instead of every viewer's request travelling to a central server, it travels to the nearest edge node — which might be in the same city or even the same ISP infrastructure.

For video on demand, CDN caching is straightforward. For live streaming, it is significantly harder.

Live video is delivered as a sequence of small segments — typically 2–6 seconds each in HLS (HTTP Live Streaming) format. Each segment is a small file. Once a segment is generated, it is effectively static — it does not change. This makes live video segments cacheable, but with an extremely short cache window.

Hotstar's CDN strategy had three layers:

Edge nodes — the outermost layer, geographically distributed across India with nodes inside major ISP networks. A viewer in Chennai hits a Chennai edge node. A viewer in Delhi hits a Delhi edge node. The physical distance between viewer and content is measured in milliseconds.

Origin shield — a regional caching layer between edge nodes and origin servers. When an edge node does not have a segment cached, it requests from the origin shield rather than the origin server directly. This means the origin server might receive one request for a segment that then serves 10,000 viewers through the shield and edge hierarchy.

Multi-CDN routing — Hotstar did not use a single CDN provider. They used multiple CDNs simultaneously with intelligent routing that selected the best-performing CDN for each viewer based on real-time performance data. If Akamai was degraded in a region, traffic shifted to Fastly or CloudFront automatically.

The cache hit rate target for live segments was above 95%. That means 95% of all video requests were served entirely from CDN edge nodes without touching Hotstar's origin infrastructure. The remaining 5% — which at 25 million viewers is still 1.25 million viewers — hit the origin shield or origin servers.


Adaptive Bitrate Streaming — Why Everyone Gets a Different Stream

No two viewers of the same IPL match watched the same video stream. A viewer on a 4G connection in a rural area watched a different bitrate than a viewer on a fibre connection in Mumbai — but both watched the same match without buffering (in the best case).

This is Adaptive Bitrate (ABR) streaming. Hotstar's encoder produced the same live video content at multiple quality levels simultaneously — typically six to eight different bitrates ranging from 200 Kbps for very low bandwidth to 8 Mbps for high-definition viewing.

The HLS manifest file — a small text file the player downloads first — lists all available quality levels and their segment URLs. The video player on the viewer's device monitors available bandwidth every few seconds and switches to the appropriate quality level dynamically.

From an infrastructure perspective, ABR streaming multiplies the CDN storage and bandwidth requirements by the number of quality levels. But it also means the system gracefully degrades for viewers on poor connections instead of failing completely — which is the engineering equivalent of graceful degradation at the product level.


Infrastructure Pre-Warming — The Work Done Before the First Ball

The most underrated part of Hotstar's engineering is what happened hours before match start.

A 25 million viewer spike does not ramp up gradually. It is nearly instantaneous — the moment the first ball is bowled, tens of millions of people open the app simultaneously. There is no warm-up period. The system goes from near-zero to maximum load in under two minutes.

Autoscaling cannot respond fast enough to this. Spinning up new EC2 instances takes 2–5 minutes. By the time autoscaling catches up to the demand spike, the system has already been under maximum load for several minutes without sufficient capacity.

Hotstar's solution was pre-warming: provisioning the full match-day infrastructure hours before the match started. Every application server, every cache node, every database connection pool was fully initialised and ready before a single viewer connected.

This required knowing the capacity requirement in advance — which came from load testing, historical match data, and audience projection modelling. The engineering team essentially pre-paid for the peak capacity, accepting the cost of idle servers in exchange for eliminating the cold-start problem.


The Part Nobody Talks About — Ancillary Services Nearly Killed It

The video stream was the part Hotstar engineered most carefully. It was not the part that caused the most problems.

The ancillary services — the scorecard that updates every ball, the comments section, the authentication system, the ad serving platform — these were the components that repeatedly came close to cascading failure during peak IPL traffic.

Consider the scorecard API. Every viewer's app polls the scorecard endpoint to update the live score. At 25 million concurrent viewers polling every 5 seconds, that is 5 million API requests per second to a service that most engineering teams would not consider a primary reliability concern.

The comments section is worse. During exciting moments — a six, a wicket, a last-ball finish — comment volume spikes by 10–20x in seconds. The write load on the comments database during these moments is unlike any sustained traffic pattern and essentially impossible to smooth with standard caching strategies because the content is unique per comment.

Hotstar's solutions for ancillary services were pragmatic rather than elegant:

Scorecard: Aggressively cached at the CDN layer with a short TTL. Viewers might see a score that is 3–5 seconds old. During a cricket match, this is acceptable. The alternative — live polling at 5 million requests per second — is not.

Comments: Rate limited per user, per device, and per region. During extreme load, comment submission was throttled aggressively. Some comments were queued and delivered with delay rather than in real time.

Authentication: Tokens cached aggressively. Re-authentication during a match was avoided by extending session lifetimes on match days.

Ad serving: Pre-fetched and cached at the client level before match start. Ad calls during match load were the lowest-priority traffic and were shed first under pressure.


My Take — What Hotstar's Engineering Actually Teaches Us

I have spent a lot of time thinking about what makes Hotstar's IPL engineering genuinely impressive versus what is just impressive-sounding. And the honest answer is that the video streaming architecture — the CDN hierarchy, the adaptive bitrate, the multi-CDN routing — is well-understood industry practice. The real engineering achievement is the operational discipline.

The pre-warming decision is what gets me. Provisioning full peak capacity hours before the match starts means paying for idle servers. That is a deliberate, uncomfortable decision — spending money on capacity that is not being used because the alternative is a cold-start failure during the highest-visibility moment of the year. Most engineering teams would fight that budget line. Hotstar's teams had the data from previous failures to win the argument.

The ancillary services problem is the part that I think represents the real lesson. The video stream was over-engineered in the best sense — it had redundancy, graceful degradation, multiple fallback layers. The ancillary services were engineering afterthoughts that nearly caused the headline failures. This is the pattern I see repeatedly: teams solve the obvious problem brilliantly and get blindsided by the adjacent one.

The worst version of this failure mode is when a peripheral service — a recommendations engine, a notification system, a logging pipeline — generates so much internal traffic during a spike that it overwhelms internal network capacity. The video was fine. The logs killed the servers.

The future of this problem is interesting. Edge computing and serverless are moving video delivery further toward the viewer, reducing origin load further. But the real unsolved problem is user-generated content at scale during concurrent spikes — comments, reactions, polls. That is a fundamentally harder problem than video delivery because it cannot be pre-computed or heavily cached. I think we are at least five years from seeing that solved elegantly at IPL scale.


What Other Engineering Teams Can Learn From This

You are probably not building for 25 million concurrent viewers. But the principles Hotstar used apply at every scale:

Separate your traffic types. Video delivery, API calls, and user-generated content have completely different traffic patterns, caching strategies, and failure modes. Architect them separately and apply different resilience patterns to each.

Know your ancillary services. Every system has components that seem secondary until they cause the primary failure. Map all the services that will receive traffic during your peak event — including the ones you did not build.

Pre-warm for spike traffic. If your traffic pattern is a known spike — a launch, a sale, a live event — autoscaling is not the answer. Pre-provision. Accept the cost. Eliminate the cold-start window.

Cache aggressively and degrade gracefully. A slightly stale scorecard is better than a crashed scorecard. A delayed comment is better than a failed comment. Define your degradation strategy before the spike, not during it.

Rate limit everything. Not just your public API. Internal services, comment endpoints, notification systems — everything that can receive amplified traffic during a spike should have a rate limit.


Comparison: Streaming Architecture Approaches at Scale

Approach Max Concurrent Scale Failure Mode Cost Model Best For
Single origin, no CDN Thousands Sudden collapse Low initially Development only
Single CDN Millions CDN PoP failures Moderate Standard streaming
Multi-CDN with routing Tens of millions Regional degradation High Live events
Multi-CDN + edge compute 100M+ Cost spikes Very high Global live events
P2P hybrid CDN Hundreds of millions Quality inconsistency Low per viewer Experimental

Frequently Asked Questions

How did Hotstar prevent the app from crashing when 25 million people opened it simultaneously?
The match start spike was handled through a combination of pre-warmed infrastructure, aggressive CDN caching of the initial HLS manifest and first video segments, and staggered app-level reconnection logic. The app was designed to introduce small random delays in reconnection attempts so that 25 million simultaneous opens did not translate into 25 million simultaneous origin requests. This jitter — measured in milliseconds — was enough to smooth the spike at the origin layer.

What happens when a CDN node fails during a live match?
Multi-CDN routing detects CDN node degradation in real time through continuous health checks and performance monitoring. When a node or CDN provider degrades, traffic is automatically rerouted to healthy nodes or alternative CDN providers within seconds. Viewers experience a brief buffering event — typically 2–5 seconds — while the player reconnects through a new CDN path. This is the graceful degradation designed into the architecture.

Why is live streaming harder to cache than video on demand?
Video on demand content is fully generated before any viewer watches it — CDN caches can be pre-populated and warmed before traffic arrives. Live streaming generates content in real time as 2–6 second segments. Each segment must be encoded, packaged, pushed to origin storage, and then pulled through the CDN cache hierarchy to edge nodes — all within the segment duration. The cache warming window is the segment duration itself, which means the first viewer for each segment always hits a cache miss.

Did Hotstar ever crash during IPL?
Yes. Hotstar experienced significant outages during IPL seasons before the 2019 record — most notably during the 2018 IPL final where millions of viewers reported buffering and app crashes. The 2019 record was the result of years of incremental engineering improvements directly caused by previous failures. The outage post-mortems from 2017 and 2018 are the real engineering document behind the 2019 record — they just were not published publicly.

How does this apply to a team building a much smaller live streaming product?
The principles scale down. Use a CDN from day one — even for small products, CDN edge caching dramatically reduces origin load during spikes. Implement adaptive bitrate streaming — the HLS tooling is open source and well-documented. Rate limit your ancillary APIs aggressively. And load test your specific traffic pattern — a concurrent spike test is completely different from a sustained load test and most teams only run the latter.


Conclusion

Hotstar's 25 million concurrent viewer record was not an accident of good infrastructure. It was the outcome of specific architectural decisions — multi-CDN routing, adaptive bitrate streaming, aggressive caching hierarchies, pre-warmed infrastructure, and hard-won operational discipline from previous public failures.

The engineering lesson is not the scale. It is the mindset: treat your peak moment as a known, plannable event, not an unpredictable spike. Know your breaking points. Pre-provision for them. Rate limit everything downstream. And spend as much engineering time on your ancillary services as you spend on your primary traffic path — because that is where the failure will come from.

The video stream will probably be fine. It is the scorecard API that will take you down.


Related reads: Why Apps Crash During High Traffic · How SaaS Companies Actually Make Money · How to Deploy Next.js on Vercel Step-by-Step · How to Create a SaaS with Next.js and Supabase