“Here’s the bottom line: the number one driver for shipping products quicker is by focusing on the important ones and killing the unimportant ones.”
“You might be thinking: ‘True, but couldn’t we also increase the average completion rate’? You’re right, but the impact of doing that is much lower than reducing the TIP (things-in-process) — that is, influencing the average completion rate is rather difficult and is often a function of available resources, scope creep, market demands, and changes, etc.”– Pete Abilla, Nov 2006, Little’s Law for Product Development
A few weeks back Arne Roock (see his posts here), a fellow kanban/lean-agile practitioner, pinged me with a question related to Little’s Law and utilization. Paraphrasing, essentially it was “Queuing theory states that the speed of networks decreases dramatically (non-linearly) as utilization increases more than 80%. But according to Little‘s Law (given a stable system), Lead Time increases linearly if we increase WIP (which increases utilization). Why doesn’t Little’s Law show lead time going up exponentially from a certain point on (ex. past 80% utilization)?” This resulted in exchanging a couple of emails discussing the use of Little’s Law, and why and how in the software development context an increase in work-in-progress could result in a non-linear increase in lead time. This post captures and reflects some of the thoughts we shared. My assumption is you’ve wondered too about similar questions. If so, I hope you’ll find this post interesting and helpful.
Little’s Law: In a Nutshell
First, starting out we’ll get on the same page with a quick review of Little’s Law.
Note: I see Little’s Law represented and labeled in various ways when applied to different contexts and scenarios. The versions here are based primarily on my personal application of it in the software development context.
Little’s Law is typically defined as follows:
L = λ W
L = average number of work items in a queuing system over a period of time
λ = average number of work items arriving into the queuing system over some unit time
W = average time a work item spends overall in the queuing system
Applying some basic algebra, Little’s Law is often re-organized and re-labeled for use in the software development context as follows:
LT = WIP / Throughput
LT = average “lead time” of how long it takes a work item to move through the workflow from start to delivery
WIP = average # of work items you’ll see in progress at any time in the workflow over some period of time of interest
Throughput = average # of work items (exiting) the workflow per some unit time
Observe the changes include substituting Throughput, an “average completion rate” (ACR), for an “average arrival rate” (AAR) in the original form. In the software development context, the work items are typically things like a new user story, a change request, or a bug fix.
Note: See my earlier post here for a bit more discussion of Lead time versus Cycle time in a software development context and how it might impact a discussion on the duration or effort tracking.
Little’s Law: A Quick Test Drive
Now let’s take the second form of Little’s Law above for a quick “classroom context” test drive. It’s a simple ratio, so by holding Throughput (the denominator) constant, it’s easy to see when WIP (the numerator) increases, then LT(the quotient) will increase in a proportional manner. Nothing difficult about this classroom context and scenario, right?
But does the assumption that we can hold Throughput constant as WIP increases hold up often? We’ll revisit this question in a bit, but let’s also get on the same page with assumptions that Little’s Law is based on first.
Little’s Law: Assumptions
There are only a few, and here’s my short version: all three parameters in Little’s Law have to be in the “same units”, and each must be a “long-running” average in a “stable” system. The first assumption is easy. If you’re specifying Throughput in work items per day then WIP needs to be measured in “equivalent” work items. For example, in a software development context, if you measure an average rate for Throughput using task-level work items completed per day don’t use an average WIP measured in story level work items.
However, the last two assumptions are a bit more complicated because there are nuances specific to different contexts and scenarios. Being aware nuances exist and understanding them is important, but discussing them in detail is beyond the scope of this post. For now, I’ll point you to this reference for more information on these nuances (see Ch. 5 – Little’s Law, by Dr. Stephen C. Graves, Professor of Mechanical Engineering and Engineering Systems at M.I.T.). Still, I’ll include just-enough-detail here to show why and how they are important.
Little’s Law: Long-Running Averages and a Stable System
Time to get our feet wet with Little’s Law and these assumptions using a simple real-world context and scenario. My sketch (click on image to enlarge) was inspired by an example that used water tanks to explain Little’s Law (unfortunately, the web page to the source is no longer available).
In the left tank, water arrives and exits at the same rate (2 gal. / hr.), so over time, Throughput, the water exiting remains constant. Also, the water level in the tank, WIP, remains constant over time (5 gal.). Since no water gets stuck to the side of the tank, over time the average time, LT, for any molecule (or a gal.) to pass through the tank is constant. Again, as long as we’re consistent for both Throughput and WIP, we can use units of molecules, or gals., or whatever is best for our context.
Applying Little’s Law we get 5 gal. / 2 gal. per hr. to yield a LT of 2.5 hrs. Let’s step through this too. If water stopped flowing into the tank, all the water would pass through in 2.5 hours based on the exit rate. Replacing the water at the same rate it exits simply keeps the water level at 5 gal. while water entering passes through the tank on average in 2.5 hours.
Now we’ll utilize more of the water tank (volume), increasing the flow in until it reaches a new water level (20 gal.), then back off the flow going in down to a rate again equal to the water exiting the tank (2 gal. / hr.). We keep the Throughput, water exiting the tank, constant during this time, so the same number of molecules (or gals.) are exiting the tank as before the water began arriving at a faster rate.
Note: I’m focusing on the case as we increase WIP since this is really the heart of the original question regarding Little’s Law and a system’s utilization as it increases. I acknowledge there is another side related to what happens when we lower WIP beyond a certain point, that I’m not addressing at all in this post.
While water arrives faster than it exits, WIP is not a long-running average, but is increasing over this time, right? Is the system in a stable state during this time? If we’re increasing the WIP of water in the tank, by Little’s Law, we’d expect an increasing average LT for our water molecules (or gals.), right? But, is it appropriate to use Little’s Law during this time that water is arriving faster than exiting?
Throughput was constant as WIP increased during this time. Still, in our simple water tank example, the system becomes “stable” again only when the flow of water arriving in the tank returns to a rate equal to the water exiting. Once the system is stable (again), then Little’s Law yields a new long-running average LT at this higher WIP. We see 20 gal. / 2 gal. per hr. yields a new long-running average LT of 10 hrs. The increase in LT is again proportional to the increase in WIP which is now four times greater. Note though there’s no mention of utilization levels in the assumptions that Little’s Law is based on.
Little’s Law: Utilization Effect
So, is Little’s Law a “linear” relationship such that an increase in WIP always produces a proportional increase in LT? Let’s revisit that earlier question, can we assume we’ll hold Throughput constant in our second form of Little’s Law as we increase WIP?
The water tank example is a context and scenario for using Little’s Law that is pretty simple, as it models a fairly “continuous and uniform workflow.” The value of walking through this simple example for me is that it helps see the importance of understanding the assumptions. The why and how regarding using the same units is pretty evident. Hopefully now, so is the why and how related to having long-running averages and a stable system before using Little’s Law.
Note: for me, another critical point to catch here is the importance of a WIP limit. Even setting just one at the overall system level to start, even if it appears “impractically high” at first, setting a WIP limit helps create and maintain a stable system enabling the (effective) use of this helpful tool (Little’s Law), for learning, understanding, managing, and more effectively validating if changes improved your workflow.
When someone says they’re not using “explicit” work limits, and claim they’re “successfully” delivering for the most part on-time, and there exists a clear sizeable backlog of pending work, after “digging” a little deeper I commonly find one or both of these things.
The first, someone (usually an experienced, skillful project manager) is employing “implicit” work limits, by effectively deferring some work items often through reducing scope or fidelity of features and making efforts to manage expectations and mitigating visible risks. This is “good project management” in my opinion, but I don’t see this often. The second is, quality really could be much better and there is a significant amount of “hero effort” needed.
The disadvantage of “limiting WIP” in an implicit way that I observe is the lack or poor visibility contributes yet even more to a process being subjective, often more political, enables more circumventing, encourages more context switching, often requires more “unplanned” over-time, often produces more bugs and hacks, etc. Yes, I see explicit work limits ignored too with similar results. However, having worked both ways my experience still leads me to believe determining and visualizing WIP limits that closely reflect current capabilities of the environment, then creating behaviors through polices to manage your workflow with respect to these limits, results in a more effective (less subjective) context for surfacing root causes of delay and poor quality, and making and validating helpful changes.
But will workflows in the computer network or software development contexts and scenarios be as uniform or continuous as water? If not, how might this change things with using Little’s Law? In a software development context, I immediately thought about how context switching, if not managed (with workflow policies) as WIP increased, could make it difficult to keep Throughput constant.
For example, looking back at the second form of the equation again:
LT = WIP / Throughput
if context switching began occurring heavily as WIP increased causing Throughput to decrease, then we’d see an increasing numerator over a decreasing denominator. In this case, when (if) the system became stable once again, plugging into Little’s Law the new higher long running average WIP along with the new lower long running average Throughput, then the quotient, LT will increase non-linearly, not proportionally, relative to this new long running average WIP. Again, when Throughput doesn’t remain constant but instead decreases as WIP increases we’ll see a non-proportional increase in LT relative to the increase in WIP.
The key points from my email discussion with Arne can be summarized as follows: 1) it is important to know and understand the assumptions that Little’s Law is based on including the nuances for your context and scenario; 2) depending on your context and scenario, Little’s Law can yield a result for LT that is proportional to an increase in WIP as well as one that may be non-linear relative to an increase in WIP.
Speaking more specifically for the moment about the second form presented above, we need to make sure our system is stable with long running averages for WIP and Throughput before any calculation of a long running average for LT can be made. If Throughput remained constant as WIP was increased to a new level, then the new value for LT will be an increase proportional to the increase in WIP. However, when an increase in WIP is accompanied with a decrease in Throughput (ex. context switching in a software development context and scenario), the new value for LT will be a non-proportional increase relative to the new increased WIP level.
Over the next few days though I realized there is more to Arne’s initial question. I felt we didn’t dig into the first part of the question enough. Yes, while these key points summarized above related to Little’s Law were helpful to discuss, they don’t touch on the core of why in a computer network context a non-linear decrease in speed occurs as utilization increased above 80%. In retrospect, maybe this wasn’t really at the core of Arne’s question to me either. Still, over the next few days, I wondered more about why and how the non-linear decrease of speed in computer networks occurs at higher utilization. More importantly, could this be helpful to me in creating and shaping the policies that manage software development workflows at higher WIP levels? As you might guess by now, I’ve dived into this question too a bit already and learned some interesting things I think would be helpful to capture and share, but this definitely will have to wait for another post.
NOTE: I’m thrilled to see that after being posted years ago, my blog posts related to Little’s Law continue to remain consistently viewed month to month, including this one, by dozens of folks each week. So, as one who also focuses a lot on “Learning to Learn”, I’d love to hear a little bit on how you have used/plan to use Little’s Law in your specific context, and I’d be happy to also share a bit more with you on how I’ve used it as well!
20200605 UPDATE: first, if you are reading this, Thank You for sticking with the post all the way to the end. I hope you found it helpful in improving your understanding of Little’s Law and that it helps with managing your workflows (all kinds).
The experience gained from digging deep into Little’s Law, learning about the assumptions it is based on, led me to also understanding how significant the policies (see here) that influence and govern product development workflows, and perhaps even more so service delivery workflows are to creating stable systems and predictable delivery. (Note: I must acknowledge and thank Dan Vacanti; when we started together on this deep dive neither of us knew where our early conversations long ago about Little’s Law would lead us, and we invested together numerous long conversations during this exploration, but it was definitely worthwhile).
Perhaps, more importantly, though, I hope this post highlights how critical it is to know and understand the assumptions that (analysis) tools are based on (including theoretical probability distributions which are based on assumptions too). In particular, any tools you are using to govern your current product development and service delivery workflow processes and guiding your process improvement decisions for them. That may sound like a tall order, however, the alternative of not doing so is not good either (ex. lots of wasted time, or worse, never pivoting to a more helpful direction). 🙁
More recently I have been looking into process behavior charts, including the assumptions they are based on and more deeply how and why they work. Similar to my exploration into Little’s Law, this more recent exploration of PBCs and in particular through the related work by Dr. Donald J. Wheeler, has also been very helpful. Not just related to understanding more about improving workflows, but more broadly about data analysis, “continuous improvement”, and “lean thinking.”
Surprisingly, there is a link between these two explorations that relates to what I am interested in, which is “systems and processes” (ex. product development and service delivery) in a broader sense that is helpful to understanding what predictability is in terms of a system and the analysis of that system, what is necessary for predictability, and how to improve it. I hope to get time soon to further develop and capture these thoughts. In the meantime, here is my first post related to this latest exploration (see here). (Note: I must acknowledge and thank Dan Vacanti; just as neither of us knew where our early conversations about Little’s Law would lead us, we see again, this latest exploration is requiring numerous long conversations as well, definitely worthwhile, but more are needed. Thanks to Dr. Wheeler as well (see his site here), who has been very helpful in shaping and guiding this latest exploration.)
Okay, it‘s been five years, but it‘s only now that I‘ve realized I‘ve never postet the link to the German translation here:-)
I love the post, but now with the German transalation and a tweet coming with it, I do not like the title any more 😉
My take is: LL has (as every Law) conditions under which it holds. And, under these conditions it holds. (I just made the tautology explicit 😉 There are conditions under which it doesn’t hold. Then it does not make any statement. Then the observed effects can be linear or nonlinear. They might be polynomial, exponential. Or … linear. No statement.
So, I think the right title might be ‘Does LL hold under your system conditions?’ Or: ‘Is your system invariant over time, as otherwise, LL does not make a statement about’ etc.
The baseline is: LL in it’s original form is an integral over time.So, roughly, the system needs to be stable over the observed time, otherwise no LL. That means if the WIP increase is drastically changing the system itself, or ‘if there is a correlation between the system status and its WIP’: NO LL.
The title suggests there is some ‘scientifically sensation’ here, which isn’t – it is rather science applied to the real world. It is basically the core of what I referred to with my Pecha Kucha at #LKCE 🙂
But again: It IS a great post, it just puts a wrong conclusion (you didn’t even make) at the beginning.
I nearly didn’t get the captcha solved 😮
Thanks and all the best
I’m glad to see Arne’s translation effort resulted in more conversation and I like the essence too of what you’ve captured in your comment above. The post’s title for sure is with faults, and yet invites questioning and provokes discussion rather than provides “conclusions.” ☺
In Aug of 2011, I started discussing my own questions re: LL and CFDs with Dan Vacanti, and these initial discussions lead to others and lots more questions, and the discussions and my learning continues.
The Kanban Metrics tutorials we’ve given at LSSC2012 (Boston) and at LKCE2012 (Vienna), and Dan’s talk at LKCE2012 have amplified the discussion even further, and a number of us engaged in it still more at KLRUS (San Diego) last month. Arne and I knew a single blog post would be insufficient for this topic, and there is definitely more to be discussed, understood, learned, and benefited from re: LL.
Thanks for your comments and tweets and I look forward to seeing you and Arne again, and perhaps over a good meal and glass of wine we’ll continue our conversation.
Thanks for pointing out that Little’s Law is not as easy as some people present it. I usually add a third complication to those you discussed here, that explains why a WiP Limit of 1 is not always the best choice, even though LL suggests so if yu look it it naively:
LL comes from Queueing Theory that examines the *queues* that pile up in front of a server that uses a certain process with certain statistical behaviour. Though LL is true *regardless* of this statistical behaviuor, it does not hold anymore, if the process changes. Applied to Software Kanban this means that LL is a good guideline as long as you manage queues. If the WiP limit starts to effect the proces itself, we have dependent variables in LL. Since this interdependence may be non-linear, we probably run into the realms of Complex Systems with all their unpredictable behaviour (e.g. people may start to work on their own projects if they are significantly under-utilized, some controller may run havok or whatever). Hence LL is a great tool to manage queues. But it doesn’t mean that a Complex (Adaptive?) System suddenly starts to behave linearely.
Here’s how the “high utilization means grinding to an halt” more or less works. All times are averages and the system is stable; no context switch is allowed.
The model is a cashier line: a queue where you spend time waiting and a station where you spend time paying.
Consider the station itself. Its utilization is:
U = lambda * S
where lambda is the rate of arrival and S the service time for that station (how much it takes to pay when there’s no one in the queue, let’s say).
Consider the full system:
L = lambda * W
where L is the people in the system at a certain time, and W is the total waiting time.
Noe the total waiting time is composed of the time you spend in queue, plus the time you spend at the station:
W = L * S + S
since when you arrive, you have to wait until (on average) L customers are served, and then be served yourself.
So substituting L in this equation, you get:
W = lambda * W * S + S
W (1 – lambda * S) = S
W = S / (1 – lambda * S)
W = S / (1 – U)
so when U goes up, near to 1, the denominator approaches 0 and the total time you spend in the system approaches infinity.
My intuitive explanation: if we were able to perfectly time customer’s arrivals we would be able to arriva to U = 100%: they would arrive every S seconds, be served, and when they exist another one would be ready. But in this model we are talking about average times, so they arrive randomly: sometimes early, sometimes late. If they arrive early, they wait in the queue for a bit, so their L goes up; if they arrive late, at certain times the queue is empty and we lose a bit of utilization because no one can be served during that time.
Thanks for zooming in on the term “stable system”. Viewing a “system under change” as a “currently not stable system” is interesting.
But doesn´t that mean, any system where work item size is not uniform is not a stable system? That way Arne´s original question would have compared apples with pears:
-Apples: Software development described by LT=WIP/TP. A stable system by definition. That´s what the formula applies to. Implicit assumption: work item size is constant.
-Pears: The queueing theory network with non-linear TP decrease above 80% utilization. An implicitly non stable system, since “packet size” is not defined and thus can vary greatly.
To me that would mean, the first question to ask about a system is: can the work item/packet size be assumed to be pretty much constant?
If no, the system is not stable and non linear effects can be expected and LT=WIP/TP cannot be readily applied.
If yes, other factors have to be checked to determine stableness.
I’m sure my reply won’t do this great question you bring up here justice, as Arne suggests in his earlier comment, any discussion about Little’s Law provides an opportunity for part 2, 3, 4, etc. In short, if I understand your question correctly, I’d suggest you don’t need “uniform size” work item in order to produce a “stable long running average” for say lead time. The average is simply that an “average.” Alone, this average is useful but still one must be aware of the data set that generated it as well as the system context that generated the data. That said, I hear this come up again and again, so maybe I’m not understanding correctly why others feels this is required and I’m certainly open to discussing more.
@Frank: Sure, an average is just that, an average 🙂
Still, though, as a customer waiting in a super market cashier line an average waiting time does not really make me happy if right before me is a guy with his cart filled up to the brim. I know for sure, my personal waiting time will be way above the average.
That´s why there are express lanes, I´d say. To lower the average. And to make waiting time more predictable on a personal level.
Coming back to the 80% utilization: If a network is utilized only 10% or 40% even a large change in work item size does not affect the ability to take up more work. Maybe just one more work item then leads to a 65% utilization.
But above 80% a possible, even likely variation in work item size might lead to network overload. That´s when things break down.
So I´d say not only it´s important to know the average work item size, but also the variance in work item size. If it´s small, i.e. work items are of pretty much the same size, then all´s dandy. But if the variance is large… then there is a real danger of exceeding buffer capacity once utilization is high.
But maybe I´m misunderstanding a crucial point?
Thanks for the follow-up. I think you’re understanding is fine. That is, I agree with your second comment completely :>)
Still, from your first comment, I have to admit I’m stumbling on the following:
“But doesn´t that mean, any system where work item size is not uniform is not a stable system?
“To me that would mean, the first question to ask about a system is: can the work item/packet size be assumed to be pretty much constant? If no, the system is not stable and non linear effects can be expected and LT=WIP/TP cannot be readily applied.”
It could be me just missing something here.
excellent blog post, thank you! And I‘m looking forward to part 2,3,4 and 5 😉
Thank you!! Yes, there is plenty of opportunity for follow-up conversations and posts on this topic :>) I look forward to catching up some with you on this at LKCE2012 as well.