Press "Enter" to skip to content

线上java程序CPU占用过高问题排查

故障描述

突然收到线长告警,服务器负载高,java项目,程序运行在容器里面。

定位线程

top指令

[admin@host-11-21-22 ~]$ top
top - 17:42:32 up 1340 days,  6:45,  2 users,  load average: 10.27, 11.13, 11.81
Tasks:  24 total,   1 running,  23 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.0%us,  3.8%sy,  0.0%ni, 85.7%id,  0.9%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:  263562880k total, 247574712k used, 15988168k free,  2988992k buffers
Swap:        0k total,        0k used,        0k free, 134288604k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   169 admin     20   0 11.3g 500m  15m S 99.5  0.2 101:01.67 java
     1 root      20   0  105m 1356 1164 S  0.0  0.0   0:00.17 sh
    38 root      20   0  114m 1296  652 S  0.0  0.0   0:00.59 crond
149044 admin     20   0 24080 1312 1040 R  0.0  0.0   0:00.08 top

可以明显看到 进程ID是169的java程序明显CPU占用异常。

定位线程

top -H -p pid

那么我们输入 top -H -p 169

top - 17:44:41 up 1340 days,  6:47,  2 users,  load average: 10.30, 10.72, 11.56
Tasks: 145 total,   1 running, 144 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.0%us,  3.3%sy,  0.0%ni, 89.4%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  263562880k total, 247613040k used, 15949840k free,  2989060k buffers
Swap:        0k total,        0k used,        0k free, 134326568k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   315 admin     20   0 11.3g 500m  15m R 97.5  0.2  65:03.20 java
   312 admin     20   0 11.3g 500m  15m S  1.0  0.2  10:24.47 java
   368 admin     20   0 11.3g 500m  15m S  1.0  0.2  11:27.13 java
   406 admin     20   0 11.3g 500m  15m S  1.0  0.2  11:25.67 java
   896 admin     20   0 11.3g 500m  15m S  0.3  0.2   0:03.38 java
   169 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:00.00 java
   175 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:22.65 java
   177 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:24.29 java
   178 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:00.24 java
   179 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:00.25 java
   180 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:00.00 java
   181 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:01.74 java
   182 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:03.93 java
   184 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:00.61 java
   185 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:02.35 java
   186 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:01.79 java
   213 admin     20   0 11.3g 500m  15m S  0.0  0.2   0:06.30 java

这时我们找到了子线程的ID是 315,这是10进制的,等下我们需要他的16进制ID。

简单的shell来完成进制转换 printf ‘%x\n’ number

printf '%x\n' 315

13b

那我们得到了 13b。

定位线程

打印线程栈 jstack -l pid

jstack -l 169 | grep 13b


"CLIENT_SIDE_RINGBUFFER" #41 daemon prio=5 os_prio=0 tid=0x00007feb813bd800 nid=0x150 waiting on condition [0x00007feb2119c000]
"LoggingFilter-pushJob" #30 prio=5 os_prio=0 tid=0x00007feb80a5c800 nid=0x13b runnable [0x00007feb227a4000]

找到他了,就是这个 LoggingFilter-pushJob 线程再搞怪,去分析问题吧!

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注