Всем привет!
На одном из серверов постоянно падает один процесс. Добавил это процесс в исключения для OOM killer, чтобы система не роняла его, когда загружена память. Но что-то это не сильно помогло.
ОС
Код | Linux 2.6.32-358.11.1.el6.x86_64
|
Системные логи (/var/log/messages)
Код | Dec 8 01:03:40 server2 init: clusterd2 main process (20959) killed by PIPE signal Dec 8 01:03:40 server2 init: clusterd2 main process ended, respawning Dec 8 01:03:40 server2 sv-sma: ERROR: Connect to terminal MU (10.10.203.135:4997): Error SCTP-connection Dec 8 01:03:40 server2 sv-integrity: Exiting by the signal 15 Dec 8 03:28:14 server2 init: clusterd2 main process (26703) killed by PIPE signal Dec 8 03:28:14 server2 init: clusterd2 main process ended, respawning Dec 8 03:43:02 server2 rhsmd: In order for Subscription Manager to provide your system with updates, your system must be registered with the Customer Portal. Please enter your Red Hat login to ensure your system is up-to-date. Dec 8 03:43:34 server2 init: clusterd2 main process (21546) killed by PIPE signal Dec 8 03:43:34 server2 init: clusterd2 main process ended, respawning Dec 8 06:37:41 server2 auditd[2256]: Audit daemon rotating log files Dec 8 09:08:03 server2 init: clusterd2 main process (31511) killed by PIPE signal Dec 8 09:08:03 server2 init: clusterd2 main process ended, respawning Dec 8 10:08:15 server2 init: clusterd2 main process (11055) killed by PIPE signal Dec 8 10:08:15 server2 init: clusterd2 main process ended, respawning Dec 8 11:43:27 server2 init: clusterd2 main process (16973) killed by PIPE signal Dec 8 11:43:27 server2 init: clusterd2 main process ended, respawning Dec 8 12:27:07 server2 init: clusterd2 main process (12684) killed by PIPE signal Dec 8 12:27:07 server2 init: clusterd2 main process ended, respawning Dec 8 14:36:16 server2 init: clusterd2 main process (8077) killed by PIPE signal Dec 8 14:36:16 server2 init: clusterd2 main process ended, respawning Dec 8 15:40:19 server2 init: clusterd2 main process (25381) killed by PIPE signal Dec 8 15:40:19 server2 init: clusterd2 main process ended, respawning Dec 8 17:08:17 server2 init: clusterd2 main process (1343) killed by PIPE signal Dec 8 17:08:17 server2 init: clusterd2 main process ended, respawning Dec 8 21:47:39 server2 auditd[2256]: Audit daemon rotating log files Dec 8 22:05:13 server2 init: clusterd2 main process (25017) killed by PIPE signal Dec 8 22:05:13 server2 init: clusterd2 main process ended, respawning Dec 8 22:19:46 server2 init: clusterd2 main process (19214) killed by PIPE signal Dec 8 22:19:46 server2 init: clusterd2 main process ended, respawning Dec 9 00:30:00 server2 init: clusterd2 main process (28578) killed by PIPE signal Dec 9 00:30:00 server2 init: clusterd2 main process ended, respawning Dec 9 00:36:57 server2 init: clusterd2 main process (14220) killed by PIPE signal Dec 9 00:36:57 server2 init: clusterd2 main process ended, respawning Dec 9 00:51:15 server2 init: clusterd2 main process (18633) killed by PIPE signal Dec 9 00:51:15 server2 init: clusterd2 main process ended, respawning Dec 9 01:32:58 server2 init: clusterd2 main process (27847) killed by PIPE signal Dec 9 01:32:58 server2 init: clusterd2 main process ended, respawning Dec 9 01:52:12 server2 init: clusterd2 main process (22053) killed by PIPE signal Dec 9 01:52:12 server2 init: clusterd2 main process ended, respawning
|
Логи ядра (var/log/kern.log)
Код | [root@server2 tmp]# cat /var/log/kern.log Dec 7 19:22:37 server2 kernel: imklog 5.8.10, log source = /proc/kmsg started. [root@server2 tmp]#
|
Запуск/остановка процесса
Код | description "Служба обеспечения резерва сервера" author "Vitaliy V. Sayfullin" env CONF=/etc/opt/sparta/cluster2.conf env INFO=/var/sparta/info2 env INFO0=/var/sparta/info2.0 env LOG=/var/log/cluster2.log env CMD=/var/sparta/cmd2 env START=/var/sparta/start_cluster expect fork normal exit 0 1 TERM HUP start on filesystem or runlevel [2345] stop on runlevel [016] exec /usr/bin/cluster $CONF pre-start script echo $(date)": Служба обеспечения резерва запущена" >> $LOG end script pre-stop script echo stop > $CMD #wait for cluster stop while pgrep -f "/usr/bin/cluster $CONF" &>/dev/null ; do sleep 1; echo "Ожидаю остановки службы резерва" >> $LOG done rm -f $CMD rm -f $INFO rm -f $INFO0 echo $(date)": Служба обеспечения резерва остановлена" >> $LOG end script respawn
|
P.S. На другом сервере этот процесс работает стабильно, не падает. Таже ОС и такие же параметры. |