定时器不执行问题

业务背景

最近生产环境遇到了一个问题,服务运行正常,接口也可以访问,但是项目中一个定时任务突然不执行了,并且没有什么异常信息,这个定时任务是每天执行一次,去请求第三方接口拉取数据,由于不是什么重要的数据,这个问题在3天后才被发现

问题猜想

因为项目中使用的定时器是使用springboot自带的@Scheduled注解实现的,通过跟踪源码发现,在springboot执行定时任务时,框架默认只启动一个线程执行,如果前一个任务由于某些原因导致线程被阻塞,那么就会出现之后所有的定时器都不会执行,直到该线程被释放掉。

问题排查

  1. 使用linux客户端登录到生产服务器

  2. 输入top命令,发现cpu和内存的使用率并不是很高

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    [root@iz2ze9hbg36c3rlf5v08uwz ~]# top
    top - 14:36:27 up 52 days, 3:40, 2 users, load average: 0.47, 0.14, 0.08
    Tasks: 77 total, 1 running, 76 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 1.7 us, 0.7 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
    KiB Mem : 1883724 total, 114700 free, 436760 used, 1332264 buff/cache
    KiB Swap: 0 total, 0 free, 0 used. 1200096 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    2477 root 10 -10 146436 27316 5996 S 1.3 1.5 969:36.62 AliYunDun
    16380 root 20 0 2074724 76296 11056 S 0.3 4.1 97:21.58 java
    25664 root 20 0 156556 8420 1400 S 0.3 0.4 64:42.31 redis-server
    1 root 20 0 190964 3688 2276 S 0.0 0.2 9:26.19 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.04 kthreadd
    3 root 20 0 0 0 0 S 0.0 0.0 1:12.96 ksoftirqd/0
    5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
    7 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
    8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
    9 root 20 0 0 0 0 S 0.0 0.0 27:56.73 rcu_sched
    10 root rt 0 0 0 0 S 0.0 0.0 0:21.11 watchdog/0

  3. 输入jps命令,查询java进程

1
2
3
4
[root@iz2ze9hbg36c3rlf5v08uwz ~]# jps
31841 Jps
16380 Application
31663 jar
  1. 输入top -Hp 31663,显示改进程的线程信息列表
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[root@iz2ze9hbg36c3rlf5v08uwz ~]# top -Hp 31663
top - 14:40:50 up 52 days, 3:44, 2 users, load average: 0.02, 0.10, 0.08
Threads: 26 total, 0 running, 26 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.3 us, 0.7 sy, 0.0 ni, 97.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1883724 total, 113080 free, 438296 used, 1332348 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1198552 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31663 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 java
31664 root 20 0 2533316 127148 13704 S 0.0 6.7 0:03.23 java
31665 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.23 VM Thread
31666 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.01 Reference Handl
31667 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.01 Finalizer
31668 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 Signal Dispatch
31669 root 20 0 2533316 127148 13704 S 0.0 6.7 0:03.57 C2 CompilerThre
31670 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.85 C1 CompilerThre
31671 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 Service Thread
31672 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.19 VM Periodic Tas
31678 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.05 Catalina-utilit
31679 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 Catalina-utilit
31680 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 container-0
31682 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.08 http-nio-8080-e
31683 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-e
31684 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-e
31685 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-e
31686 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-e
31687 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-e
31688 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-e
31689 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-e
31690 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-e
31691 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-e
31692 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.02 http-nio-8080-P
31693 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.00 http-nio-8080-A
31694 root 20 0 2533316 127148 13704 S 0.0 6.7 0:00.05 scheduling-1
  1. 找到定时器对应的线程号31694,将其转换成16进制后,在打印其堆栈信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
[root@iz2ze9hbg36c3rlf5v08uwz ~]# jstack 31663 | grep "7bce" -C100 --color
2021-08-06 14:48:05
Full thread dump OpenJDK 64-Bit Server VM (25.292-b10 mixed mode):

"Attach Listener" #27 daemon prio=9 os_prio=0 tid=0x00007f32080ca800 nid=0x7e2a waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"DestroyJavaVM" #26 prio=5 os_prio=0 tid=0x00007f323404b800 nid=0x7bb0 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"scheduling-1" #25 prio=5 os_prio=0 tid=0x00007f3234b98000 nid=0x7bce runnable [0x00007f320d3b3000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x00000000e3c1ad78> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593)
- locked <0x00000000e3c1ae28> (a sun.net.www.protocol.http.HttpURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
- locked <0x00000000e3c1ae28> (a sun.net.www.protocol.http.HttpURLConnection)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at org.springframework.http.client.SimpleBufferingClientHttpRequest.executeInternal(SimpleBufferingClientHttpRequest.java:82)
at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:66)
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:776)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:711)
at org.springframework.web.client.RestTemplate.getForEntity(RestTemplate.java:361)
at com.example.demo.TestTask.task(TestTask.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:95)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
  1. 通过堆栈信息发现定时器线程被阻塞在这一行
1
at com.example.demo.TestTask.task(TestTask.java:25)
  1. 查看代码发现这一行调用了http接口,但是没有设置超时时间,导致线程一直在尝试连接
1
2
3
4
5
6
7
8
9
10
11
@Scheduled(cron = "0 */1 * * * ?")
public void task() {
LocalDateTime now = LocalDateTime.now();
System.out.println(String.format("当前时间: %s", df.format(now)));

//调用测试的接口
RestTemplate restTemplate = new RestTemplate();
ResponseEntity<String> entity = restTemplate.getForEntity("http://localhost:8080/test", String.class);

System.out.println(entity.getBody());
}

总结

最终我们通过为HTTP请求设置过期时间,解决了该问题,其实排查问题时也可以借助一些其他的工具,如Arthas等,也能更好的帮助我们排查问题

文章目录
  1. 1. 业务背景
  2. 2. 问题猜想
  3. 3. 问题排查
  4. 4. 总结
|