这两天在处理一个内核资源泄漏的问题,整个分析过程比较典型,记在这里以便以后参考。
这个问题发生在测试部同事的真实机上,于是抓内核DMP用的是sysinternals的livekd,命令行很简单:
livekd -o dump-file-path
注:为了分析,我开了GFlags中的kernel pool tagging功能。
分析过程如下:
1 0: kd> !vm
2
3 *** Virtual Memory Usage ***
4 Physical Memory: 842859 ( 3371,436 Kb)
5 ...(Boxcounter:此处省略PagingFile相关输出)
6 Available Pages: 582485 ( 2329,940 Kb)
7 ResAvail Pages: 756136 ( 3024,544 Kb)
8 Locked IO Pages: 51 ( 204 Kb)
9 Free System PTEs: 32523 ( 130,092 Kb)
10 Free NP PTEs: 32767 ( 131,068 Kb)
11 Free Special NP: 94800 ( 379,200 Kb)[/font]
12 Modified Pages: 17626 ( 70,504 Kb)
13 Modified PF Pages: 17131 ( 68,524 Kb)
14 NonPagedPool Usage: 4096 ( 16,384 Kb) <<
15 NonPagedPoolNx Usage: 5076 ( 20,304 Kb)
16 NonPagedPool Max: 65536 ( 262,144 Kb)
17 PagedPool 0 Usage: 7665 ( 30,660 Kb)
18 PagedPool 1 Usage: 16753 ( 67,012 Kb)
19 PagedPool 2 Usage: 16728 ( 66,912 Kb)
20 PagedPool 3 Usage: 16766 ( 67,064 Kb)
21 PagedPool 4 Usage: 16725 ( 66,900 Kb)
22 PagedPool Usage: 74637 ( 298,548 Kb) <<
23 PagedPool Maximum: 92160 ( 368,640 Kb)
24 Session Commit: 2095 ( 8,380 Kb)
25 Shared Commit: 13172 ( 52,688 Kb)
26 Special Pool: 1175 ( 4,700 Kb)
27 Shared Process: 3457 ( 13,828 Kb)
28 PagedPool Commit: 74583 ( 298,332 Kb) <<
29 Driver Commit: 3840 ( 15,360 Kb)
30 Committed pages: 196936 ( 787,744 Kb)
31 Commit limit: 2327741 ( 9310,964 Kb)
32
33 Total Private: 95118 ( 380,472 Kb)
34 0c40 procexp.exe 19592 ( 78,368 Kb)
35 0860 360Tray.exe 14027 ( 56,108 Kb)
36 070c svchost.exe 13336 ( 53,344 Kb)
37 068c explorer.exe 7362 ( 29,448 Kb)
38 08b8 XDict.exe 7102 ( 28,408 Kb)
39 0470 RTHDCPL.EXE 5785 ( 23,140 Kb)
40 075c svchost.exe 4462 ( 17,848 Kb)
41 06d4 SEye.exe 2513 ( 10,052 Kb)
42 03b8 winlogon.exe 1739 ( 6,956 Kb)
43 019c ZhuDongFangYu.e 1519 ( 6,076 Kb)
44 ... (Boxcounter:省略剩余不重要的进程信息)
这段输出需要关注的重点是第16、22和28行。
看上去,nonpaged pool使用情况比较正常,主要嫌疑在paged pool上。
看看paged pool吧。
1 0: kd> !poolused 5
2 Sorting by Paged Pool Consumed
3
4 Pool Used:
5 NonPaged Paged
6 Tag Allocs Frees Diff Used Allocs Frees Diff Used
7 prpt 0 0 0 0 4191169 1396976 2794193 203,976,160 UNKNOWN pooltag 'prpt', please update pooltag.txt
8 otpu 0 0 0 0 349352 28 349324 30,269,696 UNKNOWN pooltag 'otpu', please update pooltag.txt
9 INTC 38799 38410 389 95952 1355200 1352980 2220 9,142,856 Intel video driver
10 Gh05 0 0 0 0 561321 560728 593 4,213,584 GDITAG_HMGR_SPRITE_TYPE , Binary: win32k.sys
11 UlHT 0 0 0 0 1 0 1 4,198,400 Hash Table , Binary: http.sys
12 MmSt 0 0 0 0 1463662 1462178 1484 3,680,824 Mm section object prototype ptes , Binary: nt!mm
13 VMON 2 0 2 72 648 0 648 2,646,432 Volume Manager , Binary: volmgr.sys
14 CM35 0 0 0 0 39 13 26 2,629,632 Internal Configuration manager allocations , Binary: nt!cm
15 Gcac 0 0 0 0 214 157 57 2,096,640 Gdi glyph cache
16 Ttfd 0 0 0 0 3713 3023 690 1,275,080 TrueType Font driver
17 Ifs 7166549 7166547 2 176 5361522 5360467 1055 1,131,376 Default file system allocations (user's of ntifs.h)
18 Ddk 9649715 9649684 31 931176 2684 790 1894 1,087,824 Default for driver allocated memory (user's of ntddk.h)
19 CMDa 0 0 0 0 216579 214244 2335 530,480 value data cache pool tag , Binary: nt!cm
20 Ntff 18 0 18 3744 84222 83589 633 526,656 FCB_DATA , Binary: ntfs.sys
21 Gla1 1 0 1 112 3961 3652 309 494,400 GDITAG_HMGR_LOOKASIDE_DC_TYPE , Binary: win32k.sys
22 Gla5 1 0 1 112 7449 6223 1226 480,592 GDITAG_HMGR_LOOKASIDE_SURF_TYPE , Binary: win32k.sys
23 logb 0 0 0 0 1959 1954 5 413,728 UNKNOWN pooltag 'logb', please update pooltag.txt
24 pipt 0 0 0 0 353979 351340 2639 407,664 UNKNOWN pooltag 'pipt', please update pooltag.txt
25 CM25 0 0 0 0 972 947 25 327,680 Internal Configuration manager allocations , Binary: nt!cm
26 Obtb 0 0 0 0 362905 362778 127 315,376 object tables via EX handle.c , Binary: nt!ob
重点是第7、8行这两个windbg不认识的tag。绝大部分的paged pool内存都是它们的。看看它们是谁。
C:\WINDOWS\system32\drivers>findstr /m /l prpt *sys
qutmdrv.sys
C:\WINDOWS\system32\drivers>findstr /m /l otpu *sys
qutmdrv.sys
找到这个驱动,看了下属性,安全卫士的,该驱动的版本是7.2.1.1022。
分析到这里我想起之前看过的一个类似的样例
现在找到了一个嫌疑人,为了确认没有误伤,卸掉我的驱动重启,等了10来分钟,看内存和cpu占用都稳定了,启动纯应用层测试工具做压力测试。
在测试前抓了个dmp,当时状态是这样的:
0: kd> .time
Debug session time: Wed Aug 15 10:37:44.500 2012 (UTC + 8:00)
System Uptime: 0 days 0:13:07.406
1 0: kd> !vm
2
3 *** Virtual Memory Usage ***
4 Physical Memory: 842859 ( 3371436 Kb)
5 ...(Boxcounter:此处省略PagingFile相关输出)
6 Available Pages: 703681 ( 2814,724 Kb)
7 ResAvail Pages: 757240 ( 3028,960 Kb)
8 Locked IO Pages: 50 ( 200 Kb)
9 Free System PTEs: 104176 ( 416,704 Kb)
10 Free NP PTEs: 32767 ( 131,068 Kb)
11 Free Special NP: 95598 ( 382,392 Kb)
12 Modified Pages: 1698 ( 6,792 Kb)
13 Modified PF Pages: 545 ( 2,180 Kb)
14 NonPagedPool Usage: 4096 ( 16,384 Kb) <<
15 NonPagedPoolNx Usage: 3164 ( 12,656 Kb)
16 NonPagedPool Max: 65536 ( 262,144 Kb)
17 PagedPool 0 Usage: 8987 ( 35,948 Kb)
18 PagedPool 1 Usage: 934 ( 3,736 Kb)
19 PagedPool 2 Usage: 934 ( 3,736 Kb)
20 PagedPool 3 Usage: 923 ( 3,692 Kb)
21 PagedPool 4 Usage: 944 ( 3,776 Kb)
22 PagedPool Usage: 12722 ( 50,888 Kb) <<
23 PagedPool Maximum: 92160 ( 368,640 Kb)
24 Shared Commit: 11888 ( 47,552 Kb)
25 Special Pool: 273 ( 1,092 Kb)
26 Shared Process: 3503 ( 14,012 Kb)
27 PagedPool Commit: 12690 ( 50,760 Kb) <<
28 Driver Commit: 3807 ( 15,228 Kb)
29 Committed pages: 111988 ( 447,952 Kb)
30 Commit limit: 2327741 ( 9310,964 Kb)
依然着重看第15、22和27行。
再重点看看两个嫌疑人的状态。
0: kd> !poolused 1 prpt
Sorting by Tag
Pool Used:
NonPaged Paged
Tag Allocs Frees Diff Used Allocs Frees Diff Used
prpt 0 0 0 0 1081 296 785 57,376 UNKNOWN pooltag 'prpt', please update pooltag.txt
otpu 0 0 0 0 184 30 154 11,104 UNKNOWN pooltag 'otpu', please update pooltag.txt
这个输出我做了一些调整,!poolused命令不支持同时查询多个tag,我使用了两次,然后把结果合并了。
看看经过几个小时的测试后的状态。
0: kd> .time
Debug session time: Wed Aug 15 15:26:00.875 2012 (UTC + 8:00)
System Uptime: 0 days 5:01:23.703
1 0: kd> !vm
2
3 *** Virtual Memory Usage ***
4 Physical Memory: 842859 ( 3,371,436 Kb)
5 ...(Boxcounter:此处省略PagingFile相关输出)
6 Available Pages: 604366 ( 2,417,464 Kb)
7 ResAvail Pages: 757320 ( 3,029,280 Kb)
8 Locked IO Pages: 50 ( 200 Kb)
9 Free System PTEs: 103840 ( 415,360 Kb)
10 Free NP PTEs: 32767 ( 131,068 Kb)
11 Free Special NP: 95604 ( 382,416 Kb)
12 Modified Pages: 1883 ( 7,532 Kb)
13 Modified PF Pages: 711 ( 2,844 Kb)
14 NonPagedPool Usage: 4096 ( 16,384 Kb) <<
15 NonPagedPoolNx Usage: 3144 ( 12,576 Kb)
16 NonPagedPool Max: 65536 ( 262,144 Kb)
17 PagedPool 0 Usage: 9867 ( 39,468 Kb)
18 PagedPool 1 Usage: 11069 ( 44,276 Kb)
19 PagedPool 2 Usage: 11059 ( 44,236 Kb)
20 PagedPool 3 Usage: 11054 ( 44,216 Kb)
21 PagedPool 4 Usage: 11077 ( 44,308 Kb)
22 PagedPool Usage: 54126 ( 216,504 Kb) <<
23 PagedPool Maximum: 92160 ( 368,640 Kb)
24 Session Commit: 2107 ( 8,428 Kb)
25 Shared Commit: 12882 ( 51,528 Kb)
26 Special Pool: 267 ( 1,068 Kb)
27 Shared Process: 3272 ( 13,088 Kb)
28 PagedPool Commit: 54058 ( 216,232 Kb) <<
29 Driver Commit: 3807 ( 15,228 Kb)
30 Committed pages: 163773 ( 655,092 Kb)
31 Commit limit: 2327741 ( 9310,964 Kb)
32
33 Total Private: 84063 ( 336,252 Kb)
34 061c 360Tray.exe 14005 ( 56,020 Kb)
35 0390 svchost.exe 13331 ( 53,324 Kb)
36 0460 explorer.exe 10689 ( 42,756 Kb)
37 0e18 procexp.exe 9525 ( 38,100 Kb)
38 0658 XDict.exe 6997 ( 27,988 Kb)
39 ...(Boxcounter:省略其余不重要输出)
再来看看tag情况。
1 0: kd> !poolused 5
2 Sorting by Paged Pool Consumed
3
4 Pool Used:
5 NonPaged Paged
6 Tag Allocs Frees Diff Used Allocs Frees Diff Used
7 prpt 0 0 0 0 2681313 893728 1787585 130,493,776 UNKNOWN pooltag 'prpt', please update pooltag.txt
8 otpu 0 0 0 0 223536 32 223504 19,366,280 UNKNOWN pooltag 'otpu', please update pooltag.txt
9 MmSt 0 0 0 0 740248 737979 2269 12,516,920 Mm section object prototype ptes , Binary: nt!mm
10 INTC 289851 289462 389 95952 1200152 1197885 2267 9,146,056 Intel video driver
11 Gh05 0 0 0 0 350733 350137 596 4,870,704 GDITAG_HMGR_SPRITE_TYPE , Binary: win32k.sys
12 UlHT 0 0 0 0 1 0 1 4,198,400 Hash Table , Binary: http.sys
13 VMON 2 0 2 72 648 0 648 2,646,432 Volume Manager , Binary: volmgr.sys
14 CM35 0 0 0 0 39 13 26 2,629,632 Internal Configuration manager allocations , Binary: nt!cm
15 Gcac 0 0 0 0 343 277 66 2,356,704 Gdi glyph cache
16 Ttfd 0 0 0 0 5625 4930 695 1,313,168 TrueType Font driver
17 Ifs 3897847 3897844 3 240 3283105 3282050 1055 1,131,376 Default file system allocations (user's of ntifs.h)
18 Ddk 622677 622646 31 931176 2654 760 1894 1,087,824 Default for driver allocated memory (user's of ntddk.h)
19 Ntff 18 0 18 3744 19431 18491 940 782,080 FCB_DATA , Binary: ntfs.sys
20 pipt 0 0 0 0 229268 226053 3215 499,136 UNKNOWN pooltag 'pipt', please update pooltag.txt
21 ...(Boxcounter:省略其余不重要输出)
第7、8行验证了前面的判断。目前不太清楚这是qutmdrv.sys的资源泄漏BUG,还是它的缓存机制。(最开始的情况更严重是因为那时已经测试约莫11个小时,而后面这次测试时间较短)
分析过程不长,命令也就那么几个。不过确实耗了我不少时间,主要麻烦是问题不好重现和思维方式转不过弯(之前我的驱动正好刚出现过一次资源泄漏问题,所以这次一看到现象就认为是我的驱动的问题……)。