故障描述
突然收到线长告警,服务器负载高,java项目,程序运行在容器里面。
定位线程
top指令
[admin@host-11-21-22 ~]$ top
top - 17:42:32 up 1340 days, 6:45, 2 users, load average: 10.27, 11.13, 11.81
Tasks: 24 total, 1 running, 23 sleeping, 0 stopped, 0 zombie
Cpu(s): 9.0%us, 3.8%sy, 0.0%ni, 85.7%id, 0.9%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 263562880k total, 247574712k used, 15988168k free, 2988992k buffers
Swap: 0k total, 0k used, 0k free, 134288604k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
169 admin 20 0 11.3g 500m 15m S 99.5 0.2 101:01.67 java
1 root 20 0 105m 1356 1164 S 0.0 0.0 0:00.17 sh
38 root 20 0 114m 1296 652 S 0.0 0.0 0:00.59 crond
149044 admin 20 0 24080 1312 1040 R 0.0 0.0 0:00.08 top
可以明显看到 进程ID是169的java程序明显CPU占用异常。
定位线程
top -H -p pid
那么我们输入 top -H -p 169
top - 17:44:41 up 1340 days, 6:47, 2 users, load average: 10.30, 10.72, 11.56
Tasks: 145 total, 1 running, 144 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.0%us, 3.3%sy, 0.0%ni, 89.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 263562880k total, 247613040k used, 15949840k free, 2989060k buffers
Swap: 0k total, 0k used, 0k free, 134326568k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
315 admin 20 0 11.3g 500m 15m R 97.5 0.2 65:03.20 java
312 admin 20 0 11.3g 500m 15m S 1.0 0.2 10:24.47 java
368 admin 20 0 11.3g 500m 15m S 1.0 0.2 11:27.13 java
406 admin 20 0 11.3g 500m 15m S 1.0 0.2 11:25.67 java
896 admin 20 0 11.3g 500m 15m S 0.3 0.2 0:03.38 java
169 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:00.00 java
175 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:22.65 java
177 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:24.29 java
178 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:00.24 java
179 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:00.25 java
180 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:00.00 java
181 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:01.74 java
182 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:03.93 java
184 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:00.61 java
185 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:02.35 java
186 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:01.79 java
213 admin 20 0 11.3g 500m 15m S 0.0 0.2 0:06.30 java
这时我们找到了子线程的ID是 315,这是10进制的,等下我们需要他的16进制ID。
简单的shell来完成进制转换 printf ‘%x\n’ number
printf '%x\n' 315
13b
那我们得到了 13b。
定位线程
打印线程栈 jstack -l pid
jstack -l 169 | grep 13b
"CLIENT_SIDE_RINGBUFFER" #41 daemon prio=5 os_prio=0 tid=0x00007feb813bd800 nid=0x150 waiting on condition [0x00007feb2119c000]
"LoggingFilter-pushJob" #30 prio=5 os_prio=0 tid=0x00007feb80a5c800 nid=0x13b runnable [0x00007feb227a4000]
找到他了,就是这个 LoggingFilter-pushJob 线程再搞怪,去分析问题吧!