“Here’s the bottom line: the number one driver for shipping products quicker is by focusing on the important ones and killing the unimportant ones.”

“You might be thinking: ‘True, but couldn’t we also increase the average completion rate’? You’re right, but the impact of doing that is much lower than reducing the TIP (things-in-process) — that is, influencing the average completion rate is rather difficult and is often a function of available resources, scope creep, market demands and changes, etc.”– Pete Abilla, Nov 2006, Little’s Law for Product Development

A few weeks back Arne Roock (see his posts here), a fellow kanban/lean-agile practitioner, pinged me with a question related to Little’s Law and utilization. Paraphrasing, essentially it was *“Q*** ueuing theory states that the speed of networks decreases dramatically (non-linearly) as utilization increases more than 80%. But according to Little‘s Law (given a stable system), Lead Time increases linearly if we increase WIP (which increases utilization). Why doesn’t Little’s Law show lead time going up exponentially from a certain point on (ex. past 80% utilization)?”** This resulted in exchanging a couple emails discussing the use of Little’s Law, and why and how in the software development context an increase in work-in-progress could result in a non-linear increase in lead time. This post captures and reflects some of the thoughts we shared. My assumption is you’ve wondered too about similar questions. If so, I hope you’ll find this post interesting and helpful.

**Little’s Law: In a Nutshell**

First, starting out we’ll get on the same page with a quick review of Little’s Law.

Note: I see Little’s Law represented and labeled various ways when applied to different contexts and scenarios. The versions here are based primarily on my personal application of it in the software development context.

Little’s Law is typically defined as follows:

*L* = λ *W*

Where:

*L* = average number of work items in a queuing system over a period of time

λ = average number of work items arriving into the queuing system over some unit time

*W *= average time a work item spends overall in the queuing system

Applying some basic algebra, Little’s Law is often re-organized and re-labeled for use in the software development context as follows:

*LT* = *WIP* / *Throughput*

Where:

*LT* = average “lead time” of how long it takes a work item to move through the workflow from start to delivery

*WIP* = average # of work items you’ll see in progress at any time in the workflow over some period of time of interest

*Throughput* = average # of work items (exiting) the workflow per some unit time

Observe the changes include substituting *Throughput, *an “average completion rate” (ACR), for an average arrival rate in the original form. In the software development context the work items are typically things like a new user story, a change request, or a bug fix.

Note: See my earlier post here for a bit more discussion of Lead time versus Cycle time in a software development context and how it might impact a discussion on duration or effort tracking.

### Little’s Law: A Quick Test Drive

Now let’s take the second form of Little’s Law above for a quick “classroom context” test drive. It’s a simple ratio, so by holding *Throughput* (the denominator) constant, it’s easy to see when *WIP *(the numerator) increases, then *LT*(the quotient) will increase in a proportional manner. Nothing difficult about this classroom context and scenario, right?

* But does the assumption that we can hold Throughput constant as WIP increases hold up often?*We’ll revisit this question in a bit, but let’s also get on the same page with assumptions that Little’s Law is based on first.

### Little’s Law: Assumptions

There are only a few, and here’s my short version: all three parameters in Little’s Law have to be in the “same units”, and each must be a “long running” average in a “stable” system. The first assumption is easy. If you’re specifying *Throughput* in work items per day then *WIP *needs to be measured in “equivalent” work items. For example, in a software development context, if you measure an average rate for *Throughput* using task level work items completed per day don’t use an average *WIP *measured in story level work items.

However, the last two assumptions are a bit more complicated because there are nuances specific to different contexts and scenarios. Being aware nuances exist and understanding them is important, but discussing them in detail is beyond the scope of this post. For now, I’ll point you to this reference for more information on these nuances (see Ch. 5 – Little’s Law, by Dr. Stephen C. Graves, Professor of Mechanical Engineering and Engineering Systems at M.I.T.). Still, I’ll include just-enough-detail here to show why and how they are important.

### Little’s Law: Long Running Averages and a Stable System

Time to get our feet wet with Little’s Law and these assumptions using a simple real world context and scenario. My sketch (click on image to enlarge) was inspired by an example that used water tanks to explain Little’s Law (unfortunately, the web page to source is no longer available).

In the left tank, water arrives and exits at the same rate (2 gal./hr.), so over time, *Throughput*,* *the water exiting remains constant. Also, the water level in the tank, *WIP,* remains constant over time (5 gal). Since no water gets stuck to the side of the tank, over time the average time, *LT, *it takes for any one molecule (or a gal.) to pass through the tank is constant. Again, as long as we’re consistent for both *Throughput* and *WIP*, we can use units of molecules, or gals., or whatever is best for our context.

Applying Little’s Law we get 5 gal. / 2 gal. per hr. to yield a *LT *of 2.5 hrs. Let’s step through this too. If water stopped flowing into the tank, all the water would pass through in 2.5 hours based on the exit rate. Replacing the water at the same rate it exits simply keeps the water level at 5 gal. while water entering passes through the tank on average in 2.5 hours.

Now we’ll utilize more of the water tank, increasing the flow in until it reaches the new water mark (20 gal.), then back off the flow going in to a rate again equal to the water exiting the tank (2 gal. / hr.). We keep the *Throughput, *water exiting the tank, constant during this time, so the same number of molecules (or gals.) are exiting the tank as before the water began arriving at a faster rate.

Note: I’m focusing on the case as we increase

WIP,since this is really the heart of the original question regarding Little’s Law and a system’s utilization as it increases. I acknowledge there is another side related to what happens when we lowerWIPbeyond a certain point, that I’m not addressing at all in this post.

While water arrives faster than it is exiting, *WIP* is not a long running average, but is increasing over this time, right? Is the system in a stable state during this time? If we’re increasing the *WIP* of water in the tank, by Little’s Law, we’d expect an increasing* *average *LT* for our water molecules (or gals.), right? But, is it appropriate to use Little’s Law during this time that water is arriving faster than exiting?

*Throughput* was constant as *WIP* increased during this time. Still, ** in our simple water tank example the system becomes “stable” again only when the flow of water arriving in the tank returns to a rate equal to the water exiting.** Once the system is stable, then Little’s Law yields a new long running average

*LT*at this higher

*. We see 20 gal. / 2 gal. per hr. yields a new long running average*

*WIP**LT*of 10 hrs. The increase in

*LT*is again proportional with the increase in

*WIP*which is now four times greater.

**Note though there’s no mention of utilization levels in the assumptions that Little’s Law is based on**.**Little’s Law: Utilization Effect**

So, is Little’s Law a “linear” relationship such that an increase in *WIP* always produces a proportional increase in *LT*? Let’s revisit that earlier question, can we assume we’ll hold *Throughput *constant in our second form of Little’s Law as we increase *WIP*?

The water tank example is a context and scenario for using Little’s Law that is pretty simple, as it models a fairly “continuous and uniform workflow.” The value of walking through this simple example for me is that it helps see the importance of understanding the assumptions. **The why and how regarding using same units is pretty evident. Hopefully now, so is the why and how related to having long running averages and a stable system before using Little’s Law. **

Note: For me, another critical point to catch here is the importance of a

WIPlimit. Even setting just one at the overall system level to start, even if it appears “impractically high” at first, aWIPlimit helps create and maintain a stable system enabling the (effective) use of this helpful tool (Little’s Law), for learning, understanding, managing, and more effectively validating if changes improved your workflow.When someone says they’re not using “explicit” work limits, and claim they’re “successfully” delivering for the most part on-time, and there exists a clear sizeable backlog of pending work, after “digging” a little deeper I commonly find one or both of these things.

The first, someone (usually an experienced, skillful project manager) is employing “implicit” work limits, by effectively deferring some work items often through reducing scope or fidelity of features, and making efforts to manage expectations and mitigating visible risks. This is “good project management” in my opinion, but I don’t see this often. The second is, quality really could be much better and there is a significant amount of “hero effort” needed.

The disadvantage of an implicit way that I observe is the lack or poor visibility contributes yet even more to a process being subjective, often more political, enables more circumventing, encourages more context switching, often requires more “unplanned” over-time, often produces more bugs and hacks, etc. Yes, I see explicit work limits ignored too with similar results. However, having worked both ways my

still leads me toexperiencedetermining and visualizingbelieveWIPlimits that closely reflect current capabilities of the environment, then creatingthrough polices to manage your workflow with respect to these limits,behaviorsin a more effective (less subjective) context for surfacing root causes of delay and poor quality, and making and validating helpful changes.results

But will workflows in the computer network or software development contexts and scenarios be as uniform or continuous as water? If not, how might this change things with using Little’s Law? In a software development context, I immediately thought about how context switching, if not managed (with workflow policies) as *WIP* increased, could make it difficult to keep *Throughput* constant.

For example, looking back at the second form of the equation again:

*LT* = *WIP* / *Throughput*

if context switching began occurring heavily as *WIP* increased causing *Throughput* to decrease, then we’d see an increasing numerator over a decreasing denominator. In this case, when (if) the system became stable once again, plugging into Little’s Law the new higher long running average *WIP *along with the new lower long running average *Throughput, *then the quotient, *LT* will increase non-linearly, not proportionally, relative to this new long running average *WIP*. Again, **when Throughput doesn’t remain constant but instead decreases as WIP increases we’ll see a non-proportional increase in LT relative to the increase in ****WIP.**

**Summary**

The key points from my email discussion with Arne can be summarized as follows: **1) it is important to know and understand the assumptions that Little’s Law is based on including the nuances for your context and scenario; 2) depending on your context and scenario, Little’s Law can yield a result for LT that is proportional to an increase in WIP as well as one that may be non-linear relative to an increase in WIP.**

Speaking more specifically for the moment about the second form presented above, we need to make sure our system is stable with long running averages for *WIP* and *Throughput* before any calculation of a long running average for *LT* can be made. If *Throughput *remained constant as *WIP* was increased to a new level, then the new value for *LT *will be an increase proportional to the increase in *WIP*. However, when an increase in *WIP* is accompanied with a decrease in *Throughput* (ex. context switching in a software development context and scenario), the new value for *LT* will be a non-proportional increase relative to the new increased *WIP *level.

Over the next few days though I realized there is more to Arne’s initial question. I felt we didn’t dig into the first part of the question enough. Yes, while these key points summarized above related to Little’s Law were helpful to discuss, they don’t touch on the core of why in a computer network context a non-linear decrease in speed occurs as utilization increased above 80%. In retrospect, maybe this wasn’t really at the core of Arne’s question to me either. ** Still, over the next few days I wondered more about why and how the non-linear decrease of speed in computer networks occurs at higher utilization. More importantly could this be helpful to me in creating and shaping the polices that manage software development workflows at higher WIP levels? **As you might guess by now, I’ve dived into this question too a bit already and learned some interesting things I think would be helpful to capture and share, but this definitely will have to wait for another post.

Take care,

Frank

December 17th, 2012 on 1:19 am

Hi Frank,

I love the post, but now with the German transalation and a tweet coming with it, I do not like the title any more 😉

My take is: LL has (as every Law) conditions under which it holds. And, under these conditions it holds. (I just made the tautology explicit 😉 There are conditions under which it doesn’t hold. Then it does not make any statement. Then the observed effects can be linear or nonlinear. They might be polynomial, exponential. Or … linear. No statement.

So, I think the right title might be ‘Does LL hold under your system conditions?’ Or: ‘Is your system invariant over time, as otherwise, LL does not make a statement about’ etc.

The baseline is: LL in it’s original form is an integral over time.So, roughly, the system needs to be stable over the observed time, otherwise no LL. That means if the WIP increase is drastically changing the system itself, or ‘if there is a correlation between the system status and its WIP’: NO LL.

The title suggests there is some ‘scientifically sensation’ here, which isn’t – it is rather science applied to the real world. It is basically the core of what I referred to with my Pecha Kucha at #LKCE 🙂

But again: It IS a great post, it just puts a wrong conclusion (you didn’t even make) at the beginning.

I nearly didn’t get the captcha solved 😮

Thanks and all the best

Markus

December 17th, 2012 on 5:39 pm

Hi Markus,

I’m glad to see Arne’s translation effort resulted in more conversation and I like the essence too of what you’ve captured in your comment above. The post’s title for sure is with faults, and yet invites questioning and provokes discussion rather than provides “conclusions.” ☺

In Aug of 2011, I started discussing my own questions re: LL and CFDs with Dan Vacanti, and these initial discussions lead to others and lots more questions, and the discussions and my learning continues.

The Kanban Metrics tutorials we’ve given at LSSC2012 (Boston) and at LKCE2012 (Vienna), and Dan’s talk at LKCE2012 have amplified the discussion even further, and a number of us engaged in it still more at KLRUS (San Diego) last month. Arne and I knew a single blog post would be insufficient for this topic, and there is definitely more to be discussed, understood, learned, and benefited from re: LL.

Thanks for your comments and tweets and I look forward to seeing you and Arne again, and perhaps over a good meal and glass of wine we’ll continue our conversation.

Take care,

Frank

September 19th, 2012 on 3:23 am

Thanks for pointing out that Little’s Law is not as easy as some people present it. I usually add a third complication to those you discussed here, that explains why a WiP Limit of 1 is not always the best choice, even though LL suggests so if yu look it it naively:

LL comes from Queueing Theory that examines the *queues* that pile up in front of a server that uses a certain process with certain statistical behaviour. Though LL is true *regardless* of this statistical behaviuor, it does not hold anymore, if the process changes. Applied to Software Kanban this means that LL is a good guideline as long as you manage queues. If the WiP limit starts to effect the proces itself, we have dependent variables in LL. Since this interdependence may be non-linear, we probably run into the realms of Complex Systems with all their unpredictable behaviour (e.g. people may start to work on their own projects if they are significantly under-utilized, some controller may run havok or whatever). Hence LL is a great tool to manage queues. But it doesn’t mean that a Complex (Adaptive?) System suddenly starts to behave linearely.

Take care

Jens

September 18th, 2012 on 7:49 am

Here’s how the “high utilization means grinding to an halt” more or less works. All times are averages and the system is stable; no context switch is allowed.

The model is a cashier line: a queue where you spend time waiting and a station where you spend time paying.

Consider the station itself. Its utilization is:

U = lambda * S

where lambda is the rate of arrival and S the service time for that station (how much it takes to pay when there’s no one in the queue, let’s say).

Consider the full system:

L = lambda * W

where L is the people in the system at a certain time, and W is the total waiting time.

Noe the total waiting time is composed of the time you spend in queue, plus the time you spend at the station:

W = L * S + S

since when you arrive, you have to wait until (on average) L customers are served, and then be served yourself.

So substituting L in this equation, you get:

W = lambda * W * S + S

W (1 – lambda * S) = S

W = S / (1 – lambda * S)

W = S / (1 – U)

so when U goes up, near to 1, the denominator approaches 0 and the total time you spend in the system approaches infinity.

My intuitive explanation: if we were able to perfectly time customer’s arrivals we would be able to arriva to U = 100%: they would arrive every S seconds, be served, and when they exist another one would be ready. But in this model we are talking about average times, so they arrive randomly: sometimes early, sometimes late. If they arrive early, they wait in the queue for a bit, so their L goes up; if they arrive late, at certain times the queue is empty and we lose a bit of utilization because no one can be served during that time.

September 9th, 2012 on 2:19 am

Thanks for zooming in on the term “stable system”. Viewing a “system under change” as a “currently not stable system” is interesting.

But doesn´t that mean, any system where work item size is not uniform is not a stable system? That way Arne´s original question would have compared apples with pears:

-Apples: Software development described by LT=WIP/TP. A stable system by definition. That´s what the formula applies to. Implicit assumption: work item size is constant.

-Pears: The queueing theory network with non-linear TP decrease above 80% utilization. An implicitly non stable system, since “packet size” is not defined and thus can vary greatly.

To me that would mean, the first question to ask about a system is: can the work item/packet size be assumed to be pretty much constant?

If no, the system is not stable and non linear effects can be expected and LT=WIP/TP cannot be readily applied.

If yes, other factors have to be checked to determine stableness.

-Ralf

September 10th, 2012 on 10:18 pm

Hi Ralf,

I’m sure my reply won’t do this great question you bring up here justice, as Arne suggests in his earlier comment, any discussion about Little’s Law provides an opportunity for part 2, 3, 4, etc. In short, if I understand your question correctly, I’d suggest you don’t need “uniform size” work item in order to produce a “stable long running average” for say lead time. The average is simply that an “average.” Alone, this average is useful but still one must be aware of the data set that generated it as well as the system context that generated the data. That said, I hear this come up again and again, so maybe I’m not understanding correctly why others feels this is required and I’m certainly open to discussing more.

Take care,

Frank

September 11th, 2012 on 1:15 am

@Frank: Sure, an average is just that, an average 🙂

Still, though, as a customer waiting in a super market cashier line an average waiting time does not really make me happy if right before me is a guy with his cart filled up to the brim. I know for sure, my personal waiting time will be way above the average.

That´s why there are express lanes, I´d say. To lower the average. And to make waiting time more predictable on a personal level.

Coming back to the 80% utilization: If a network is utilized only 10% or 40% even a large change in work item size does not affect the ability to take up more work. Maybe just one more work item then leads to a 65% utilization.

But above 80% a possible, even likely variation in work item size might lead to network overload. That´s when things break down.

So I´d say not only it´s important to know the average work item size, but also the variance in work item size. If it´s small, i.e. work items are of pretty much the same size, then all´s dandy. But if the variance is large… then there is a real danger of exceeding buffer capacity once utilization is high.

But maybe I´m misunderstanding a crucial point?

September 11th, 2012 on 7:19 pm

Hi Ralf,

Thanks for the follow-up. I think you’re understanding is fine. That is, I agree with your second comment completely :>)

Still, from your first comment, I have to admit I’m stumbling on the following:

“But doesn´t that mean, any system where work item size is not uniform is not a stable system?

“To me that would mean, the first question to ask about a system is: can the work item/packet size be assumed to be pretty much constant? If no, the system is not stable and non linear effects can be expected and LT=WIP/TP cannot be readily applied.”

It could be me just missing something here.

Take care,

Frank

September 8th, 2012 on 9:31 am

Frank,

excellent blog post, thank you! And I‘m looking forward to part 2,3,4 and 5 😉

Cheers,

Arne

September 10th, 2012 on 9:58 pm

Hi Arne,

Thank you!! Yes, there is plenty of opportunity for follow-up conversations and posts on this topic :>) I look forward to catching up some with you on this at LKCE2012 as well.

Take care,

Frank