I have been noticing that WPA tends to be a bit slow at expanding tree nodes. Alas, if only I had a tool available to investigate such things… oh, right!
So here we see the rather curious CPU usage pattern WPA shows while expanding a group of about 40,000 stack traces (taking over 300ms):
Each one of the big humps involves kicking off 8 new threads, each of which hangs around for just 50ms or so. And the tiny little bump at 5.82 seconds involves 4 new threads that only run for about 1ms each!
That seems less than ideal... what do the stacks show for those periods?
Stack, Count, Weight (in view) (ms) microsoft.performance.shell.dll!Microsoft.Performance.Shell.Tables.Columns.StackFrameTagPathNodeColumn+StackFrameTagPathNodeColumnGenerator`2[System.__Canon,Microsoft.Performance.Shell.Tables.Columns.Generators.CachedOnFirstUseColumnGenerator`2[Microsoft.Performance.PerfCore4.StackTopCPtr,M, 593, 592.057600 |- microsoft.performance.shell.dll!Microsoft.Performance.Shell.Stacks.StackFrameTagsReferenceService::FrameTagsFromTop 0x0, 585, 583.954800 | |- clr.dll!JIT_MonEnterWorker_InlineGetThread_GetThread_PatchLabel, , 542, 540.769400 | |- microsoft.performance.core4.interop.dll!Microsoft.Performance.PerfCore4.StackFrameTagMapperExtensions::FrameTagsFromTop 0x0, 37, 37.151800 | |- microsoft.performance.shell.dll!Microsoft.Performance.Shell.Stacks.StackFrameTagsReferenceService::FrameTagsFromTop 0x0<itself>, 3, 2.980300
Spinning up all those threads to get worse than single threaded performance seems not so great. And why is it doing it twice? I've only expanded a single node!