近日, 为了改善 API 日益恶化的调用鸭梨,尝试做应用层的负载均衡,来调度读写流量和内外部流量。需用用到 OpenResty。由于这次的平台是 CentOS7,已经采用了 systemctl
来作为服务的管理工具,并且这次是自己下载源码构建安装,碰到一些问题,记录下来。
首先呢,安装 OpenResty 的过程是波澜不惊的,按照官网的新手教程,一路顺利构建完成。
下载最新版本,
$ cd ~/
$ wget https://openresty.org/download/openresty-1.11.2.1.tar.gz
解压,
$ tar -zxvf openresty-1.11.2.1.tar.gz
安装必要依赖,
$ sudo yum install readline-devel pcre-devel openssl-devel gcc
配置,
$ cd openresty-1.11.2.1
$ ./configure –prefix=/usr/local/openresty \
–with-pcre-jit \
–with-ipv6 \
–without-http_redis2_module \
–with-http_iconv_module \
-j2
编译 & 安装
$ gmake
$ gmake install
为 openresty
准备相应用户与用户组,
$ sudo useradd -d /var/lib/nginx nginx -s /sbin/nologin
$ sudo groupadd www-data
$ sudo usermod -aG www-data nginx
修改 openresty
安装目录下日志与 html 文件夹权限,
$ sudo chown -R nginx:www-data /usr/local/openresty/nginx/{logs,html}
编写 openresty.service
文件,
$ cat /usr/lib/systemd/system/openresty.service
[Unit]
Description=OpenResty is a dynamic web platform based on NGINX and LuaJIT.
Documentation=http://openresty.org/en/
After=network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
PIDFile=/run/openresty.pid
ExecStartPre=/usr/bin/rm -f /run/openresty.pid
ExecStartPre=/usr/local/openresty/nginx/sbin/nginx -t -c /usr/local/openresty/nginx/conf/nginx.conf
ExecStart=/usr/local/openresty/nginx/sbin/nginx -c /usr/local/openresty/nginx/conf/nginx.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
KillSignal=SIGQUIT
TimeoutStartSec=10
TimeoutStopSec=5
KillMode=process
PrivateTmp=true
Restart=on-failure
RestartSec=30s
[Install]
WantedBy=multi-user.target
这个文件部分参考了在 CentOS 7 通过 yum 安装的 nginx
的 service 文件,同时加入了进程失败自动重启的部分。
到这里,如果一切顺利的话,就没什么事收工了,那我还写这些干啥?
启动一下服务看看,
$ sudo systemctl start openresty.service
经过漫长的一阵等待,得到这些消息,
$ sudo systemctl start openresty.service
Job for openresty.service failed because a timeout was exceeded. See “systemctl status openresty.service” and “journalctl -xe” for details.
呐尼,一个 timeout
的错误终结了启动。按照提示看看有什么错误信息,
$ sudo systemctl status openresty.service
● openresty.service – OpenResty is a dynamic web platform based on NGINX and LuaJIT.
Loaded: loaded (/usr/lib/systemd/system/openresty.service; disabled; vendor preset: disabled)
Active: failed (Result: timeout) since Sat 2016-09-24 15:32:06 CST; 1min 10s ago
Docs: http://openresty.org/en/
Process: 25521 ExecStart=/usr/local/openresty/nginx/sbin/nginx -c /usr/local/openresty/nginx/conf/nginx.conf (code=exited, status=0/SUCCESS)
Process: 25516 ExecStartPre=/usr/local/openresty/nginx/sbin/nginx -t -c /usr/local/openresty/nginx/conf/nginx.conf (code=exited, status=0/SUCCESS)
Process: 25513 ExecStartPre=/usr/bin/rm -f /run/openresty.pid (code=exited, status=0/SUCCESS)
Main PID: 23647 (code=exited, status=0/SUCCESS)
Sep 24 15:30:36 foo systemd[1]: Starting OpenResty is a dynamic web platform based on NGINX and LuaJIT….
Sep 24 15:30:36 foo nginx[25516]: nginx: the configuration file /usr/local/openresty/nginx/conf/nginx.conf syntax is ok
Sep 24 15:30:36 foo nginx[25516]: nginx: configuration file /usr/local/openresty/nginx/conf/nginx.conf test is successful
Sep 24 15:30:36 foo systemd[1]: PID file /run/openresty.pid not readable (yet?) after start.
Sep 24 15:32:06 foo systemd[1]: openresty.service start operation timed out. Terminating.
Sep 24 15:32:06 foo systemd[1]: Failed to start OpenResty is a dynamic web platform based on NGINX and LuaJIT..
Sep 24 15:32:06 foo systemd[1]: Unit openresty.service entered failed state.
Sep 24 15:32:06 foo systemd[1]: openresty.service failed.
一头雾水,再看看 journalctl -xe
,
$ sudo journalctl -xe
— Unit user-0.slice has begun shutting down.
Sep 24 14:35:19 foo systemd[1]: openresty.service start operation timed out. Terminating.
Sep 24 14:35:19 foo systemd[1]: Failed to start OpenResty is a dynamic web platform based on NGINX and LuaJIT..
— Subject: Unit openresty.service has failed
— Defined-By: systemd
— Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
—
— Unit openresty.service has failed.
—
— The result is failed.
Sep 24 14:35:19 foo systemd[1]: Unit openresty.service entered failed state.
Sep 24 14:35:19 foo systemd[1]: openresty.service failed.
Sep 24 14:35:19 foo polkitd[1080]: Unregistered Authentication Agent
for unix-process:23722:139633287 (system bus name :1.52565,
object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus
lines 4854-4904/4904 (END)
按照上面的信息,问了问 Google,似乎并没有什么有用的信息。
失望之余,看看 opensresty 自己的错误日志,可惜什么都没有。又顺手看了一下 /var/log/messages
,看看能否找到点什么。
$ tail -30 /var/log/messages|less
Sep 24 15:35:04 foo systemd: Cannot add dependency job for unit firewalld.service, ignoring: Unit firewalld.service is masked.
Sep 24 15:35:04 foo systemd: Starting OpenResty is a dynamic web platform based on NGINX and LuaJIT….
Sep 24 15:35:04 foo nginx: nginx: the configuration file /usr/local/openresty/nginx/conf/nginx.conf syntax is ok
Sep 24 15:35:04 foo nginx: nginx: configuration file /usr/local/openresty/nginx/conf/nginx.conf test is successful
Sep 24 15:35:04 foo systemd: PID file /run/openresty.pid not readable (yet?) after start.
Sep 24 15:35:12 foo systemd: Removed slice user-0.slice.
Sep 24 15:35:12 foo systemd: Stopping user-0.slice.
Sep 24 15:36:01 foo systemd: Created slice user-0.slice.
Sep 24 15:36:01 foo systemd: Starting user-0.slice.
Sep 24 15:36:01 foo systemd: Started Session 26276 of user root.
Sep 24 15:36:01 foo systemd: Starting Session 26276 of user root.
Sep 24 15:36:09 foo systemd: Removed slice user-0.slice.
Sep 24 15:36:09 foo systemd: Stopping user-0.slice.
Sep 24 15:36:34 foo systemd: openresty.service start operation timed out. Terminating.
Sep 24 15:36:34 foo systemd: Failed to start OpenResty is a dynamic web platform based on NGINX and LuaJIT..
Sep 24 15:36:34 foo systemd: Unit openresty.service entered failed state.
看到这么一句日志,让我眼前一亮,
Sep 24 15:35:04 foo systemd: PID file /run/openresty.pid not readable (yet?) after start.
隐隐觉得这就是开门的钥匙了,结合之前的服务启动错误提示,启动超时失败,这里却说 PID 文件读不到,说明什么呢?
再执行一次服务启动命令,同时观察 /run/openresty.pid
,确实并没有生成。到这里,可以大胆的猜测,systemd 按照 service 文件的描述,依次执行了 ExecStartPre
, ExecStartPre
和 ExecStart
来启动 openresty,之后就通过指定的 PID 文件去监视系统中有没有这个 PID 的进程被创建出来,这里连 PID 文件都没有,自然是无法等待成功,一直监测不到 nginx 进程,只好报告 timeout
。
检查一下 openresty 默认的 nginx.conf,pid
是被注释了的,那么,自然就存在 /run/openresty.pid
这么一个 PID 文件,于是一次又一次的试图启动服务都是以超时而告终。
解决这个问题就很简单了,取消 pid 的注释,修改为和 openresty.service
中的 PID 保持一致,再次启动服务,秒起。
另外,顺手试了一下进程崩溃重启,来来来,试一下,
$ pkill nginx
nginx 进程都没有了,世界一片安宁。
回到上面,回顾一下 openresty.service
的配置,有如下两个属性的配置,
Restart=on-failure
RestartSec=30s
果然,在等待了 30s 之后,nginx 进程被神奇地重新启动了。
[全文完]
PS: 写完整篇,回过头看,其实 PID 的报错在
journalctl -xe
中已经出现了,可惜没有引起我的注意。