Docker HEALTHCHECK健康检查及注意事项
HEALTHCHECK 指令告诉 Docker 如何判断容器的状态是否正常,这是 Docker 1.12 引入的新指令。
在没有 HEALTHCHECK 指令前,Docker 引擎只可以通过容器内主进程是否退出来判断容器是否状态异常。如果程序进入死锁或者死循环状态,应用进程并不会退出,但该容器已经无法提供服务了。在 1.12 以前,Docker 无法检测到容器的这种状态,从而不会重新调度,可能导致部分容器已经无法提供服务了却还在接受用户请求。
第一:我们来测试一下这个HEALTHCHECK
1. 编写一个简单的web服务并开启健康检查:
FROM nginx:alpine
COPY index.html /usr/share/nginx/html/index.html
CMD ["nginx", "-g", "daemon off;"]
HEALTHCHECK CMD wget -O /dev/null http://localhost:80/
2. 打包镜像并启动后,可以通过docker ps查看容器最初状态,此时正在开始健康检查(health: starting):
[root@VM_33_14_centos ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1cafd6fae2cf registry.cn-shanghai.aliyuncs.com/YOUR-REGISTRY-NAME/opsdemo_opsweb:20170908_1753_209 "nginx -g 'daemon ..." 3 seconds ago Up 3 seconds (health: starting) 80/tcp
3. 过几秒中之后,再通过docker ps查看容器状态,已经变成healthy了:
[root@VM_33_14_centos ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1cafd6fae2cf registry.cn-shanghai.aliyuncs.com/YOUR-REGISTRY-NAME/opsdemo_opsweb:20170908_1753_209 "nginx -g 'daemon ..." About a minute ago Up About a minute (healthy) 80/tcp
第二:特别注意,健康检查命令,必须确保每次执行结果都保持一致
如果将上面的检查命令修改一下:
FROM nginx:alpine
COPY index.html /usr/share/nginx/html/index.html
CMD ["nginx", "-g", "daemon off;"]
HEALTHCHECK CMD wget http://localhost:80/
我们去掉了-O /dev/nul,会发生什么呢?
1. 启动容器后,状态是开始健康检查(health: starting):
[root@VM_33_14_centos ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5aab61378e58 registry.cn-shanghai.aliyuncs.com/YOUR-REGISTRY-NAME/opsdemo_opsweb:20170908_1804_210 "nginx -g 'daemon ..." 3 seconds ago Up Less than a second (health: starting) 80/tcp
2. 过几秒钟,容器状态健康(healthy):
[root@VM_33_14_centos ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5aab61378e58 registry.cn-shanghai.aliyuncs.com/YOUR-REGISTRY-NAME/opsdemo_opsweb:20170908_1804_210 "nginx -g 'daemon ..." About a minute ago Up About a minute (healthy) 80/tcp
3. 过两分钟后,容器状态变成不健康(unhealthy):
[root@VM_33_14_centos ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5aab61378e58 registry.cn-shanghai.aliyuncs.com/YOUR-REGISTRY-NAME/opsdemo_opsweb:20170908_1804_210 "nginx -g 'daemon ..." 2 minutes ago Up 2 minutes (unhealthy) 80/tcp
随后容器被重新调度,注意CONTAINER ID的变化:
[root@VM_33_5_centos ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cb1da285ac16 registry.cn-shanghai.aliyuncs.com/YOUR-REGISTRY-NAME/opsdemo_opsweb:20170908_1804_210 "nginx -g 'daemon ..." About a minute ago Up About a minute (healthy) 80/tcp opsweb.1.i4u54q7cdm7pz6f2x2fmufrs6
原因是wget第一次执行结果和接下来继续运行的结果不一致:
健康检查命令第一次运行后,本地保存了index.html文件,导致后续所有的检查命令全部失败
/ # wget http://localhost:80
Connecting to localhost:80 (127.0.0.1:80)
wget: can't open 'index.html': File exists
第三:调试 HEALTHCHECK 可以使用以下命令,可以查看到健康检查历史和检查命令的输出结果:
docker inspect --format='' your-container-name
- 11 Sep 2017 » 前一篇:Docker经验:不同集群环境保持代码一致问题
- 03 Sep 2017 » 后一篇:使用Grafana/Graphite优化Smokeping网络监控