“Here’s the bottom line: the number one driver for shipping products quicker is by focusing on the important ones and killing the unimportant ones.”
“You might be thinking: ‘True, but couldn’t we also increase the average completion rate’? You’re right, but the impact of doing that is much lower than reducing the TIP (things-in-process) — that is, influencing the average completion rate is rather difficult and is often a function of available resources, scope creep, market demands and changes, etc.”
– Pete Abilla, Nov 2006, Little’s Law for Product Development
A few weeks back Arne Roock (see his posts here), a fellow kanban/lean-agile practitioner, pinged me with a question related to Little’s Law and utilization. Paraphrasing, essentially it was “Queuing theory states that the speed of networks decreases dramatically (non-linearly) as utilization increases more than 80%. But according to Little‘s Law (given a stable system), Lead Time increases linearly if we increase WIP (which increases utilization). Why doesn’t Little’s Law show lead time going up exponentially from a certain point on (ex. past 80% utilization)?” This resulted in exchanging a couple emails discussing the use of Little’s Law, and why and how in the software development context an increase in work-in-progress could result in a non-linear increase in lead time. This post captures and reflects some of the thoughts we shared. My assumption is you’ve wondered too about similar questions. If so, I hope you’ll find this post interesting and helpful.
Little’s Law: In a Nutshell
First, starting out we’ll get on the same page with a quick review of Little’s Law.
Note: I see Little’s Law represented and labeled various ways when applied to different contexts and scenarios. The versions here are based primarily on my personal application of it in the software development context.
Little’s Law is typically defined as follows:
L = λ W
L = average number of work items in a queuing system over a period of time
λ = average number of work items arriving into the queuing system over some unit time
W = average time a work item spends overall in the queuing system
Applying some basic algebra, Little’s Law is often re-organized and re-labeled for use in the software development context as follows:
LT = WIP / Throughput
LT = average “lead time” of how long it takes a work item to move through the workflow from start to delivery
WIP = average # of work items you’ll see in progress at any time in the workflow over some period of time of interest
Throughput = average # of work items (exiting) the workflow per some unit time
Observe the changes include substituting Throughput, an “average completion rate” (ACR), for an average arrival rate in the original form. In the software development context the work items are typically things like a new user story, a change request, or a bug fix.
Note: See my earlier post here for a bit more discussion of Lead time versus Cycle time in a software development context and how it might impact a discussion on duration or effort tracking.
Little’s Law: A Quick Test Drive
Now let’s take the second form of Little’s Law above for a quick “classroom context” test drive. It’s a simple ratio, so by holding Throughput (the denominator) constant, it’s easy to see when WIP (the numerator) increases, then LT(the quotient) will increase in a proportional manner. Nothing difficult about this classroom context and scenario, right?
But does the assumption that we can hold Throughput constant as WIP increases hold up often?We’ll revisit this question in a bit, but let’s also get on the same page with assumptions that Little’s Law is based on first.
There are only a few, and here’s my short version: all three parameters in Little’s Law have to be in the “same units”, and each must be a “long running” average in a “stable” system. The first assumption is easy. If you’re specifying Throughput in work items per day then WIP needs to be measured in “equivalent” work items. For example, in a software development context, if you measure an average rate for Throughput using task level work items completed per day don’t use an average WIP measured in story level work items.
However, the last two assumptions are a bit more complicated because there are nuances specific to different contexts and scenarios. Being aware nuances exist and understanding them is important, but discussing them in detail is beyond the scope of this post. For now, I’ll point you to this reference for more information on these nuances (see Ch. 5 – Little’s Law, by Dr. Stephen C. Graves, Professor of Mechanical Engineering and Engineering Systems at M.I.T.). Still, I’ll include just-enough-detail here to show why and how they are important.
Little’s Law: Long Running Averages and a Stable System
Time to get our feet wet with Little’s Law and these assumptions using a simple real world context and scenario. My sketch (click on image to enlarge) was inspired by an example that used water tanks to explain Little’s Law (unfortunately, the web page to source is no longer available).
In the left tank, water arrives and exits at the same rate (2 gal./hr.), so over time, Throughput, the water exiting remains constant. Also, the water level in the tank, WIP, remains constant over time (5 gal). Since no water gets stuck to the side of the tank, over time the average time, LT, it takes for any one molecule (or a gal.) to pass through the tank is constant. Again, as long as we’re consistent for both Throughput and WIP, we can use units of molecules, or gals., or whatever is best for our context.
Applying Little’s Law we get 5 gal. / 2 gal. per hr. to yield a LT of 2.5 hrs. Let’s step through this too. If water stopped flowing into the tank, all the water would pass through in 2.5 hours based on the exit rate. Replacing the water at the same rate it exits simply keeps the water level at 5 gal. while water entering passes through the tank on average in 2.5 hours.
Now we’ll utilize more of the water tank, increasing the flow in until it reaches the new water mark (20 gal.), then back off the flow going in to a rate again equal to the water exiting the tank (2 gal. / hr.). We keep the Throughput, water exiting the tank, constant during this time, so the same number of molecules (or gals.) are exiting the tank as before the water began arriving at a faster rate.
Note: I’m focusing on the case as we increase WIP, since this is really the heart of the original question regarding Little’s Law and a system’s utilization as it increases. I acknowledge there is another side related to what happens when we lower WIP beyond a certain point, that I’m not addressing at all in this post.
While water arrives faster than it is exiting, WIP is not a long running average, but is increasing over this time, right? Is the system in a stable state during this time? If we’re increasing the WIP of water in the tank, by Little’s Law, we’d expect an increasing average LT for our water molecules (or gals.), right? But, is it appropriate to use Little’s Law during this time that water is arriving faster than exiting?
Throughput was constant as WIP increased during this time. Still, in our simple water tank example the system becomes “stable” again only when the flow of water arriving in the tank returns to a rate equal to the water exiting. Once the system is stable, then Little’s Law yields a new long running average LT at this higher WIP. We see 20 gal. / 2 gal. per hr. yields a new long running average LT of 10 hrs. The increase in LT is again proportional with the increase in WIP which is now four times greater. Note though there’s no mention of utilization levels in the assumptions that Little’s Law is based on.
Little’s Law: Utilization Effect
So, is Little’s Law a “linear” relationship such that an increase in WIP always produces a proportional increase in LT? Let’s revisit that earlier question, can we assume we’ll hold Throughput constant in our second form of Little’s Law as we increase WIP?
The water tank example is a context and scenario for using Little’s Law that is pretty simple, as it models a fairly “continuous and uniform workflow.” The value of walking through this simple example for me is that it helps see the importance of understanding the assumptions. The why and how regarding using same units is pretty evident. Hopefully now, so is the why and how related to having long running averages and a stable system before using Little’s Law.
Note: For me, another critical point to catch here is the importance of a WIP limit. Even setting just one at the overall system level to start, even if it appears “impractically high” at first, a WIP limit helps create and maintain a stable system enabling the (effective) use of this helpful tool (Little’s Law), for learning, understanding, managing, and more effectively validating if changes improved your workflow.
When someone says they’re not using “explicit” work limits, and claim they’re “successfully” delivering for the most part on-time, and there exists a clear sizeable backlog of pending work, after “digging” a little deeper I commonly find one or both of these things.
The first, someone (usually an experienced, skillful project manager) is employing “implicit” work limits, by effectively deferring some work items often through reducing scope or fidelity of features, and making efforts to manage expectations and mitigating visible risks. This is “good project management” in my opinion, but I don’t see this often. The second is, quality really could be much better and there is a significant amount of “hero effort” needed.
The disadvantage of an implicit way that I observe is the lack or poor visibility contributes yet even more to a process being subjective, often more political, enables more circumventing, encourages more context switching, often requires more “unplanned” over-time, often produces more bugs and hacks, etc. Yes, I see explicit work limits ignored too with similar results. However, having worked both ways my experience still leads me to believe determining and visualizing WIP limits that closely reflect current capabilities of the environment, then creating behaviors through polices to manage your workflow with respect to these limits, results in a more effective (less subjective) context for surfacing root causes of delay and poor quality, and making and validating helpful changes.
But will workflows in the computer network or software development contexts and scenarios be as uniform or continuous as water? If not, how might this change things with using Little’s Law? In a software development context, I immediately thought about how context switching, if not managed (with workflow policies) as WIP increased, could make it difficult to keep Throughput constant.
For example, looking back at the second form of the equation again:
LT = WIP / Throughput
if context switching began occurring heavily as WIP increased causing Throughput to decrease, then we’d see an increasing numerator over a decreasing denominator. In this case, when (if) the system became stable once again, plugging into Little’s Law the new higher long running average WIP along with the new lower long running average Throughput, then the quotient, LT will increase non-linearly, not proportionally, relative to this new long running average WIP. Again, when Throughput doesn’t remain constant but instead decreases as WIP increases we’ll see a non-proportional increase in LT relative to the increase in WIP.
The key points from my email discussion with Arne can be summarized as follows: 1) it is important to know and understand the assumptions that Little’s Law is based on including the nuances for your context and scenario; 2) depending on your context and scenario, Little’s Law can yield a result for LT that is proportional to an increase in WIP as well as one that may be non-linear relative to an increase in WIP.
Speaking more specifically for the moment about the second form presented above, we need to make sure our system is stable with long running averages for WIP and Throughput before any calculation of a long running average for LT can be made. If Throughput remained constant as WIP was increased to a new level, then the new value for LT will be an increase proportional to the increase in WIP. However, when an increase in WIP is accompanied with a decrease in Throughput (ex. context switching in a software development context and scenario), the new value for LT will be a non-proportional increase relative to the new increased WIP level.
Over the next few days though I realized there is more to Arne’s initial question. I felt we didn’t dig into the first part of the question enough. Yes, while these key points summarized above related to Little’s Law were helpful to discuss, they don’t touch on the core of why in a computer network context a non-linear decrease in speed occurs as utilization increased above 80%. In retrospect, maybe this wasn’t really at the core of Arne’s question to me either. Still, over the next few days I wondered more about why and how the non-linear decrease of speed in computer networks occurs at higher utilization. More importantly could this be helpful to me in creating and shaping the polices that manage software development workflows at higher WIP levels? As you might guess by now, I’ve dived into this question too a bit already and learned some interesting things I think would be helpful to capture and share, but this definitely will have to wait for another post.