Back-of-the-envelope estimation

TechTutor
Oct 19, 2024
5 min read

Back-of-the-envelope estimation refers to quick, rough calculations or estimates used to get a general idea of the feasibility, scope, or size of a project or problem. It is not intended to be highly accurate but gives a ballpark figure that helps guide decisions without needing deep analysis or formal tools.

Every developer should know how to perform back-of-the-envelope estimations because:

Rapid decision-making: During project discussions, a rough estimate helps decide if a particular task or solution is even worth pursuing.
Scope assessment: Developers often need to assess how long tasks will take, how much infrastructure might cost, or how many resources will be required.
Problem-solving: It allows developers to quickly evaluate different approaches to a problem.
Communication: It helps provide a simple, understandable estimate to stakeholders who need answers quickly.

How to Calculate Back-of-the-Envelope Estimation

Back-of-the-envelope estimation typically involves using simple approximations or known averages rather than complex formulas. The basic steps are:

Break the problem down into smaller parts: Identify the main components involved.
Use reasonable assumptions: Make educated guesses based on past experience or known data.
Perform simple calculations: Add up or multiply the approximations to get a rough estimate.

Example of Back-of-the-Envelope Estimation

Let’s say you’re building a feature that allows users to upload and process images. You want to estimate how much storage space will be needed for 1 million images.

Step 1: Break down the problem

You need to estimate the average size of an image file.
Estimate how many images will be uploaded.
Calculate the total space needed.

Step 2: Use reasonable assumptions

Assume that an average image size is about 2 MB.
You expect around 1 million users to upload images over a period.

Step 3: Perform simple calculations

The total storage space required = Average image size × Number of images.
2 MB/image × 1,000,000 images = 2,000,000 MB
Convert MB to GB : 2,000,000MB ÷ 1024MB/GB = 1953GB.

So, roughly 1.9 TB of storage would be needed for 1 million images.

This quick calculation gives you a rough idea of the storage requirements without having to dive into precise measurements or data collection

In back-of-the-envelope estimations, concepts like powers of two, availability numbers, and latency numbers are frequently used to make quick, high-level calculations. Let’s break down each concept with examples and why they are useful for programmers.

Powers of Two

Many aspects of computing—like memory, storage, and network operations—are governed by powers of two because computers operate in binary. Estimating with powers of two allows quick approximations for data sizes, memory limits, and scaling calculations.

Example:

2^10 = 1024 (about 1 KB)
2^20 = 1,048,576 (about 1 MB)
2^30 = 1,073,741,824 (about 1 GB)

For a rough estimation, you can assume:

1 KB = 10^3 bytes
1 MB = 10^6 bytes
1 GB = 10^9 bytes

If you’re dealing with scaling a web application and you want to estimate the storage for 1 million users where each user has around 1 MB of data, you could quickly estimate:

1MB user x 10^6 bytes data = 10^6MB = 1TB

This kind of estimation helps programmers think in terms of real-world implications, like how much memory or disk space might be needed.

Availability Numbers

Availability is often expressed in "nines", where 99.9% availability is called "three nines." The more nines, the higher the system's availability (or uptime). This is important when estimating system reliability or uptime during design discussions or architectural planning.

Example:

99.9% availability: "Three nines" = ~8.76 hours of downtime per year.
99.99% availability: "Four nines" = ~52.56 minutes of downtime per year.
99.999% availability: "Five nines" = ~5.26 minutes of downtime per year.

Back-of-the-envelope example:

If you aim for 99.9% availability for a service over one year (which has 365 days × 24 hours = 8,760 hours), you can quickly calculate:

Total annual downtime = 8,760 hours (1-0.999) = 8.76 hours/year

This means your system will experience roughly 8.76 hours of downtime per year. For higher availability, you’d need to build in redundancy or improve failover mechanisms, which could then reduce that downtime to minutes per year.

Latency Numbers Every Programmer Should Know

Latency refers to the delay between a request and a response, and understanding common latencies helps programmers make reasonable assumptions when designing systems. Here are rough latency numbers for various operations:

Operation	Latency (approx.)
L1 cache reference	~0.5 nanoseconds
L2 cache reference	~7 nanoseconds
Main memory (RAM) access	~100 nanoseconds
SSD random read (1 MB)	~150 microseconds
HDD random read	~10 milliseconds
Network latency (1 Gbps, data center)	~500 microseconds
Network latency (US to Europe)	~75 milliseconds

Back-of-the-envelope example:

If you are estimating the response time for a system that involves:

Reading data from memory: ~100 ns
Writing data to an SSD: ~150 µs
A network request to another data center: ~1 ms

You can sum these latency numbers to get a rough estimate:

100 ns + 150 µs + 1 ms ≈ 1.150 milliseconds total response time.

ns = nanosecond | µs = microsecond | ms = milliseconds

Contact Center Example

You want to estimate the resources required to handle 1,000 concurrent agents in your contact center. These agents handle voice calls, and you are responsible for ensuring that your system can support the operations in terms of bandwidth, server capacity, and storage.

We'll estimate three key components:

Bandwidth for voice calls.
Server capacity for managing call flows.
Storage for call recordings.

1. Estimating Bandwidth for Voice Calls

Assumptions:

Each voice call is transmitted using a codec like G.711, which requires approximately 87.2 Kbps per call (including overhead).
You have 1,000 concurrent agents, each potentially on a call.

Calculation:

Bandwidth per call = 87.2 Kbps (upstream + downstream)
Total bandwidth required = 1,000 agents × 87.2 Kbps = 87,200 Kbps = 87.2 Mbps

Thus, you will need at least 87.2 Mbps of bandwidth to handle the maximum load of 1,000 concurrent voice calls.

2. Estimating Server Capacity for Call Handling

Assumptions:

Each agent call requires the system to process 10 requests per second for operations like handling interactive voice response (IVR), call routing, agent monitoring, etc.
Each request takes about 5 milliseconds to process.
Assume your servers can handle 1,000 requests per second.

Calculation:

Total requests per second = 1,000 agents × 10 requests/second = 10,000 requests/second
If each server handles 1,000 requests/second, you need 10 servers to handle 1,000 agents concurrently.

Thus, your system will need 10 servers (assuming no redundancy or failover) to manage the operations smoothly.

3. Estimating Storage for Call Recordings

Assumptions:

Calls are recorded and stored as compressed files.
A 1-minute recording with compression (e.g., G.729) takes up about 0.5 MB of storage.
Each agent handles an average of 50 calls per day, with an average call length of 5 minutes.
You need to store recordings for 30 days.

Calculation:

Storage per call = 5 minutes × 0.5 MB/minute = 2.5 MB per call
Storage per agent per day = 50 calls/day × 2.5 MB/call = 125 MB/agent/day
Total storage for 1,000 agents per day = 1,000 agents × 125 MB = 125,000 MB = 125 GB/day
Storage for 30 days = 125 GB/day × 30 days = 3,750 GB = 3.75 TB

Thus, you will need approximately 3.75 TB of storage capacity to keep 30 days’ worth of call recordings for 1,000 agents.

Component	Assumptions	Calculation	Estimated Requirement
Bandwidth	87.2 Kbps per voice call (G.711 codec)	1,000 agents × 87.2 Kbps = 87,200 Kbps	87.2 Mbps
Server Capacity	10 requests/sec per agent, 5ms processing time, 1,000 requests/sec per server	1,000 agents × 10 requests/sec = 10,000 requests/sec	10,000 requests/sec ÷ 1,000 requests/sec per server = 10 servers
Storage for Call Recordings	0.5 MB per minute of call, 50 calls/day per agent, 5 mins per call, 30 days retention	1,000 agents × 50 calls/day × 2.5 MB/call = 125 GB/day 125 GB/day × 30 days = 3,750 GB	3.75 TB

Why It’s Useful for Developers

Speed: This allows you to make decisions quickly in meetings.
Insight: You can identify issues early, such as noticing if storage, processing time, or costs are likely to become a problem.
Communication: It’s easy to explain rough estimates to non-technical stakeholders without getting bogged down in details.

Back-of-the-envelope estimation

How to Calculate Back-of-the-Envelope Estimation

Example of Back-of-the-Envelope Estimation

Step 1: Break down the problem

Step 2: Use reasonable assumptions

Step 3: Perform simple calculations

Powers of Two

Example:

Availability Numbers

Example:

Back-of-the-envelope example:

Latency Numbers Every Programmer Should Know

Back-of-the-envelope example:

1. Estimating Bandwidth for Voice Calls

2. Estimating Server Capacity for Call Handling

3. Estimating Storage for Call Recordings

Why It’s Useful for Developers

Recent Posts

Comments

TECHTUTORTIPS.com