This post is an addition to the series of posts that I have written on TMG/ISA performance issues. Every time I get a case on TMG performance issue, there is always some new twist to the scenario. But it always circles round few known root causes e.g. name resolution, authentication issues with authenticating servers, sometimes CPU spikes due to various reasons e.g. excessing logging, at times Drive latency issues etc.
So whenever i get case where we have performance issue, these root causes are i keep looking for . This time around , this issue hit authentication piece. So the TMG server would stop responding intermittently. So i sent action plan as per my blog post http://blogs.technet.com/b/isablog/archive/2011/01/13/random-authentication-prompts-while-accessing-internet-through-isa-server-followed-by-isa-server-becoming-unresponsive.aspx for data collection .
In the perfmon logs I found backlogged packets counter spiking and as we know(please refer to my previous posts on this topic for more info) this can happen either because of authentication or name resolution issues. Then I looked at the netlogon logs collected, pasted one snippet below
******************************************************************************************************
[CRITICAL] contoso: NlAllocateClientApi timed out: 0 253
[CRITICAL] contoso: NlpUserValidateHigher: Can't allocate Client API slot.
[LOGON] SamLogon: Network logon of contoso\jsmith from x Entered
[CRITICAL] contoso: NlAllocateClientApi timed out: 0 253
[CRITICAL] contoso: NlpUserValidateHigher: Can't allocate Client API slot.
[CRITICAL] contoso: NlAllocateClientApi timed out: 0 253
[CRITICAL] contoso: NlpUserValidateHigher: Can't allocate Client API slot.
[LOGON] SamLogon: Network logon of contoso\testuser from x Returns 0xC000005E
******************************************************************************************************
There are a lot of entries with the status 0xc000005e In the err.exe did lookup for this hex code and found following :
*******************************************************************************************************
err 0xc000005e
# for hex 0xc000005e
/ decimal -1073741730 :
STATUS_NO_LOGON_SERVERS ntstatus.h
# There are currently no logon servers available to service
# the logon request.
# 1 matches found for "0xc000005e"
*******************************************************************************************************
We were also getting "NlAllocateClientApi"
Article http://support.microsoft.com/kb/2688798 explains this behavior
"
When you discover authentication time-outs or delays (also known as MaxConcurrentApi bottlenecks) in an environment, the typical way to resolve the problem is to raise the maximum allowed worker threads that service that authentication. You do this by altering the MaxConcurrentApi registry value and then restarting the Net Logon service on the servers
"
we can clearly see the timeouts in our netlogon logs so to resolve this we planned to increase the value of maxconcurrentapi registry key on the problem TMG server and increased it to 10 after that issue did not occur. I also suggested the admin that if problem recurs then we can calculate new value from the perfmon logs and if needed use perfon logs to understand how domain controllers are behaving. I found a nice post that explains similar problem that was due to domain controller http://blogs.msdn.com/b/spatdsg/archive/2006/01/05/507299.aspx