Understand System Level Thermal Resistance

From George, CEO Celsia: Today’s post on understanding thermal resistance is from Dr. Ross Wilcoxon, a thermal management veteran working for Rockwell Collins as a research and development engineer. He specializes in mechanical packaging of electronics used in harsh environments.

If any of you would like to become a guest blogger, please contact me at gmeyer@celsiainc.com.

From Dr. Wilcoxon:

System Level Thermal Resistance – Understand the Problem Before Trying to Solve It

Unless your electronics happen to be in an unusual place, such as on a satellite or on a down hole drilling head, it will be air cooled. Even systems that are liquid cooled ultimately dump heat to the ambient air from a radiator. If you look at things from a high enough level, this convective system cooling is pretty straightforward. The overall system-level cooling thermal resistance, i.e. the thermal resistance between a system’s external surfaces and the inlet cooling air) can be expressed as:

R_system-level = R_convection + R_latent, where:

R_convection = 1/hA, with h = the convection coefficient and A = surface area

R_latent = 1/mc_p, where m is the mass flow of cooling fluid and c_p is the fluid’s specific heat

OK, for you marginal purists who think that the definition of R_latent should have a 2 in the denominator, I tend to agree – but it depends a bit on your system design requirements. For you real purists who think that the whole equation is bogus and we need to use a heat exchanger analysis to be truly accurate, again I agree. But the preceding equations are the easiest to use for this discussion and are often close enough for electronics cooling applications. Keep in mind that the system level thermal resistance does not include any internal thermal resistances (the temperature gradients associated with getting heat transfer from the power dissipating components and the external surface of the system); it only accounts for transferring heat from some external surface of the system to the surrounding air.

There are a couple things to note about the parameters in the overall thermal resistance equation. First of all, since they are all in the denominators of the equations, if we want to reduce the thermal resistance we have to increase one or more of them. We can increase the convection coefficient by making the flow more turbulent and, in some cases, by increasing the air velocity; we can increase the surface area by adding fins; more flow corresponds to a higher mass flow rate (as does using higher density fluid); and a fluid with a larger specific heat will increase cooling capacity. So, for example, cooling with high humidity air will have slightly higher specific heat than dry air – but it will have a even more slightly lower density, which slightly offsets the already small improvement.

More importantly the overall thermal resistance is the sum of two different resistances. So it is important that the contribution of each thermal resistance be understood in order to ensurethat attempted improvements actually accomplish something. If the convective thermal resistance is 10C/W and the latent thermal resistance is 1C/W, adding more fans to increase the flow rate of air won’t do much of anything to reduce the overall thermal resistance. OK, it might help a little bit because higher flow rates may lead to somewhat higher convection coefficients, but the improvement won’t likely be huge. On the other hand, if your system doesn’t have a sufficient flowrate, putting in a bigger heat sink will lead to less than stellar improvements.

Understanding the contributions of the individual thermal resistances provides guidance on where to focus your efforts to improve a design. If the overall thermal resistance is dominated by convection, there may be opportunities to reduce fan speed and therefore the noise. Or if the latent thermal resistance is more significant in a design with some margin, a smaller heat sink could reduce system weight without degrading thermal performance. Maybe these suggestions are obvious, but it just seems worth pointing out that it is a good idea to estimate the magnitude of both terms in the system thermal resistance before embarking on some attempt to improve it.

That is, of course, assuming that you have control over both terms; sometimes one of the terms may be constrained by other factors. For example, commercial avionics receive a specific amount of airflow from the aircraft’s cooling system based on their power dissipation. Under normal operation, a system will receive sufficient airflow such that the air’s exhaust temperature is ~15C above the inlet temperature – regardless of what the system power dissipation is.

Keeping a big picture perspective on the influence of mass flow on temperature rise can extend beyond just the simple thermal resistance equation that I showed. Sometimes, things that are done to improve the thermal resistance for a given sub-system can lead to it actually getting hotter if one doesn’t keep track of the overall system. I saw an example of this a few years ago when I was working on a prototype radio for an airborne demonstration. The radio was mounted in a chassis that was placed in an auxiliary rack of an electronics pod on a military aircraft. The pod included a primary electronics system that used chilled air that was supplied by a refrigeration system built into the pod. Once the cooling air flowed through that primary system, it then flowed through the auxiliary rack before leaving the pod. The air flow was split more or less equally between the three slots in the auxiliary rack; for our demonstration two of the slots were empty.

There was considerable concern about our radio getting too hot in this demonstration. It had actually not been designed for use in this type of application; it really was just a lab prototype that was pressed into service for a field demonstration. Everyone involved was intent on improving the radio’s thermal management. So I didn’t think too much of it when one of our electrical engineers mentioned that he had helped to rebalance the airflow in order to improve the cooling in our slot in the pod.

A week later I was pulled into a panicked telecon because the system was shutting down due to high temperatures during the field demonstrations. Everyone had had such high hopes for this test since the airflow had been rebalanced. I eventually smartened up enough to ask exactly what was meant by ‘rebalancing’ the airflow. It turned out that it consisted of putting tape across the exits of the two empty slots in the auxiliary rack to force all of the cooling air to go through the slot that held our radio. That sounded fine until one actually thought about it. Since the three slots were in parallel at the end of the flow channel, blocking two of the slots increased the total pressure drop through the system and thereby reduced the flow rate that the fan could generate.

The primary electronics system in the pod probably dissipated ten times power than our radio. So any reduction in the flow going through the system would lead to a significant increase in the temperature of the air as it left the primary electronics system – which meant that our radio was being cooled by hotter air. In terms of the system level thermal resistance, the increase in latent resistance could overwhelm any reduction in convective resistance associated with the higher flow velocity in one slot of the auxiliary rack. I recommended getting rid of the tape on the other slots but, while it did help somewhat, that still wasn’t enough to get things to run cool enough. Heroic measures were performed by the guys at the test to keep the radio running (that’s probably a topic for another essay), and they made it through the day with sufficient data and a reasonably happy customer.

As it turned out, when things were torn down at the end of the day, someone discovered a small washer had been left in the excessively thick film of thermal grease between the radio and the mounting plate. This had apparently happened when our radio was connected to its mounting plate (by someone at the company that we were teamed with on this program). Once that washer (as well as most of the thermal grease) was removed, the thermal performance of the assembly considerably improved. With the improved, non-washer configuration, the system met thermal performance requirements – and the detrimental effects of the ‘rebalanced’ flow configuration would have likely been more noticeable.

There are probably a couple morals in that story, the first of which being that if you have so much thermal grease that you are misplacing washers in it, that is probably going to be a problem. Applying too much thermal grease is a) apparently a characteristic that is embedded in human DNA; people inherently believe that if a little grease is good then a lot of grease must be great, and b) is a topic suitable for a stand-along blog entry/rant that goes beyond the system-level perspective that was the goal of this entry.

The main point relevant to that system-level perspective is that one should keep in mind that faster, or even more, air flow, does not necessarily lead to better cooling. Same for a bigger heat sink. It’s important to remember that, when you do something to reduce one of the two thermal resistances that make up the system level resistance, you want to avoid doing something stupid that increases the other one.

The challenge in that is knowing enough about your system to be able to judge the magnitude of the two thermal resistances. Specific heat of air is the easy one: just assume it is 1kJ/kg K. You should be able to make a rough estimate of the heat transfer area if you know anything about the geometry of your system. For the convection coefficient, you can just SWAG numbers of 10 and 50 W/m2K for free and forced convection respectively. Flow rate is a bit tougher – if you know what the maximum flow rate of the fan is, assume you are running at half that. Or if you have an actual system and can measure an exhaust temperature (and know your power dissipation), you can estimate the mass flow rate by m = Q/[c_p (T_exit – T_inlet)]. That should work for either free or forced convection. All these numbers are likely to be fairly inaccurate, but that’s OK since the goal is not to nail down precise numbers. Instead,the main objective is to compare the magnitudes of the two thermal resistances to determine if one of them is dominating the other and should receive more attention in any attempts to improve the thermal management.

Please contact us if you’d like to learn more about how Celsia can help with your next heat sink project. We’ve worked on everything from consumer devices to industrial test equipment that require heat sinks to cool anywhere from a few watts to a few kilowatts.