The most basic form of recovery is the simple death-restart mechanism. Since the QNX Neutrino realtime operating system provides virtually all non-kernel functionality via user-installable programs, and since it offers complete memory protection, not only for user applications, but also for OS components (device drivers, filesystems, etc.), a resource manager or other server program can be easily decoupled from the OS.
This decoupling lets you safely stop, start, and upgrade resource managers or other key programs dynamically, without compromising the availability of the rest of the system.
Consider the following code, where we restart the inetd daemon:
/* addinet.c */ #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <sys/stat.h> #include <sys/netmgr.h> #include <fcntl.h> #include "ha/ham.h" int main(int argc, char *argv[]) { int status; char *inetdpath; ham_entity_t *ehdl; ham_condition_t *chdl; ham_action_t *ahdl; int inetdpid; if (argc > 1) inetdpath = strdup(argv[1]); else inetdpath = strdup("/usr/sbin/inetd -D"); if (argc > 2) inetdpid = atoi(argv[2]); else inetdpid = -1; ham_connect(0); ehdl = ham_attach("inetd", ND_LOCAL_NODE, inetdpid, inetdpath, 0); if (ehdl != NULL) { chdl = ham_condition(ehdl,CONDDEATH, "death", HREARMAFTERRESTART); if (chdl != NULL) { ahdl = ham_action_restart(chdl, "restart", inetdpath, HREARMAFTERRESTART); if (ahdl == NULL) printf("add action failed\n"); } else printf("add condition failed\n"); } else printf("add entity failed\n"); ham_disconnect(0); exit(0); }
The above example attaches the inetd process to a HAM, and then establishes a condition death and an action restart under it.
You need to specify the -D option to inetd, to force inetd to daemonize by calling procmgr_daemon() instead of by calling daemon(). The HAM can see death messages only from tasks that are running in session 1, and the call to daemon() doesn't put the caller into that session. |
When inetd terminates, the HAM will automatically restart it by running the program specified by inetdpath. If inetd were already running on the system, we can pass the pid of the existing inetd into inetdpid and it will be attached to directly. Otherwise, the HAM will start and begin to monitor inetd.
You could use the same code to monitor, say, slogger (by specifying /usr/sbin/slogger), mqueue (by specifying /sbin/mqueue), etc. Just remember to specify the full path of the executable with all its required command-line parameters.
Recovery often involves more than restarting a single component. The death of one component might actually require restarting and resetting many other components. We might also have to do some initial cleanup before the dead component is restarted.
A HAM lets you specify a list of actions that will be performed when a given condition is triggered. For example, suppose the entity being monitored is fs-nfs2, and there's a set of directories that have been mounted and are currently in use. If fs-nfs2 were to die, the simple restart of that component won't remount the directories and make them available again! We'd have to restart fs-nfs2, and then follow that up with the explicit mounting of the appropriate directories.
Similarly, if io-pkt* were to die, it would take down the network drivers and TCP/IP stack (npm-tcpip.so) with it. So restarting io-pkt* involves also reinitializing the network driver. Also, any other components that use the network connection will also need to be reset (like inetd) so that they can reestablish their connections again.
Consider the following example of performing a compound restart mechanism.
/* addnfs.c */ #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <sys/stat.h> #include <sys/netmgr.h> #include <fcntl.h> #include <ha/ham.h> int main(int argc, char *argv[]) { int status; ham_entity_t *ehdl; ham_condition_t *chdl; ham_action_t *ahdl; char *fsnfspath; int fsnfs2pid; if (argc > 1) fsnfspath = strdup(argv[1]); else fsnfspath = strdup("/usr/sbin/fs-nfs2"); if (argc > 2) fsnfs2pid = atoi(argv[2]); else fsnfs2pid = -1; ham_connect(0); ehdl = ham_attach("Fs-nfs2", ND_LOCAL_NODE, fsnfs2pid, fsnfspath, 0); if (ehdl != NULL) { chdl = ham_condition(ehdl,CONDDEATH, "Death", HREARMAFTERRESTART); if (chdl != NULL) { ahdl = ham_action_restart(chdl, "Restart", fsnfspath, HREARMAFTERRESTART); if (ahdl == NULL) printf("add action failed\n"); /* else { ahdl = ham_action_waitfor(chdl, "Delay1", NULL, 2000, HREARMAFTERRESTART); if (ahdl == NULL) printf("add action failed\n"); ahdl = ham_action_execute(chdl, "MountPPCBE", "/bin/mount -t nfs 10.12.1.115:/ppcbe /ppcbe", HREARMAFTERRESTART|((fsnfs2pid == -1) ? HACTIONDONOW:0)); if (ahdl == NULL) printf("add action failed\n"); ahdl = ham_action_waitfor(chdl, "Delay2", NULL, 2000, HREARMAFTERRESTART); if (ahdl == NULL) printf("add action failed\n"); ahdl = ham_action_execute(chdl, "MountWeb", "/bin/mount -t nfs 10.12.1.115:/web /web", HREARMAFTERRESTART|((fsnfs2pid == -1) ? HACTIONDONOW:0)); if (ahdl == NULL) printf("add action failed\n"); } */ } else printf("add condition failed\n"); } else printf("add entity failed\n"); ham_disconnect(0); exit(0); }
This example attaches fs-nfs2 as an entity, and then attaches a series of execute and waitfor actions to the condition death. When fs-nfs2 dies, HAM will restart it and also remount the remote directories that need to be remounted in sequence. Note that you can specify delays as actions and also wait for specific names to appear in the namespace.
Fault notification is a crucial part of the availability of a system. Apart from performing recovery per se, we also need to keep track of failures in order to be able to analyze the system at a later point.
For fault notification, you can use standard notification mechanisms such as pulses or signals. Clients specify what pulse/signal with specific values they want for each notification, and a HAM delivers the notifications at the appropriate times.
/* regevent.c */ #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <sys/stat.h> #include <fcntl.h> #include <errno.h> #include <sys/neutrino.h> #include <sys/iomsg.h> #include <sys/netmgr.h> #include <signal.h> #include <ha/ham.h> #define PCODEINETDDEATH _PULSE_CODE_MINAVAIL+1 #define PCODEINETDDETACH _PULSE_CODE_MINAVAIL+2 #define PCODENFSDELAYED _PULSE_CODE_MINAVAIL+3 #define PCODEINETDRESTART1 _PULSE_CODE_MINAVAIL+4 #define PCODEINETDRESTART2 _PULSE_CODE_MINAVAIL+5 #define MYSIG SIGRTMIN+1 int fsnfs_value; /* Signal handler to handle the death notify of fs-nfs2 */ void MySigHandler(int signo, siginfo_t *info, void *extra) { printf("Received signal %d, with code = %d, value %d\n", signo, info->si_code, info->si_value.sival_int); if (info->si_value.sival_int == fsnfs_value) printf("FS-nfs2 died, this is the notify signal\n"); return; } int main(int argc, char *argv[]) { int chid, coid, rcvid; struct _pulse pulse; pid_t pid; int status; int value; ham_entity_t *ehdl; ham_condition_t *chdl; ham_action_t *ahdl; struct sigaction sa; int scode; int svalue; /* we need a channel to receive the pulse notification on */ chid = ChannelCreate( 0 ); /* and we need a connection to that channel for the pulse to be delivered on */ coid = ConnectAttach( 0, 0, chid, _NTO_SIDE_CHANNEL, 0 ); /* fill in the event structure for a pulse */ pid = getpid(); value = 13; ham_connect(0); /* Assumes there is already an entity by the name "inetd" */ chdl = ham_condition_handle(ND_LOCAL_NODE, "inetd","death",0); ahdl = ham_action_notify_pulse(chdl, "notifypulsedeath",ND_LOCAL_NODE, pid, chid, PCODEINETDDEATH, value, HREARMAFTERRESTART); ham_action_handle_free(ahdl); ham_condition_handle_free(chdl); ehdl = ham_entity_handle(ND_LOCAL_NODE, "inetd", 0); chdl = ham_condition(ehdl, CONDDETACH, "detach", HREARMAFTERRESTART); ahdl = ham_action_notify_pulse(chdl, "notifypulsedetach",ND_LOCAL_NODE, pid, chid, PCODEINETDDETACH, value, HREARMAFTERRESTART); ham_action_handle_free(ahdl); ham_condition_handle_free(chdl); ham_entity_handle_free(ehdl); fsnfs_value = 18; /* value we expect when fs-nfs dies */ scode = 0; svalue = fsnfs_value; sa.sa_sigaction = MySigHandler; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_SIGINFO; sigaction(MYSIG, &sa, NULL); /* Assumes there is an entity by the name "Fs-nfs2". We use "Fs-nfs2" to symbolically represent the entity fs-nfs2. Any name can be used to represent the entity, but it's best to use a readable and meaningful name. */ ehdl = ham_entity_handle(ND_LOCAL_NODE, "Fs-nfs2", 0); /* Add a new condition, which will be an "independent" condition this means that notifications/actions inside this condition are not affected by "waitfor" delays in other action sequence threads */ chdl = ham_condition(ehdl,CONDDEATH, "DeathSep", HCONDINDEPENDENT|HREARMAFTERRESTART); ahdl = ham_action_notify_signal(chdl, "notifysignaldeath",ND_LOCAL_NODE, pid, MYSIG, scode, svalue, HREARMAFTERRESTART); ham_action_handle_free(ahdl); ham_condition_handle_free(chdl); ham_entity_handle_free(ehdl); chdl = ham_condition_handle(ND_LOCAL_NODE, "Fs-nfs2","Death",0); /* this actions is added to a condition that does not have a hcondnowait. Since we are unaware what the condition already contains, we might end up getting a delayed notification since the action sequence might have "arbitrary" delays, and "waits" in it. */ ahdl = ham_action_notify_pulse(chdl, "delayednfsdeathpulse", ND_LOCAL_NODE, pid, chid, PCODENFSDELAYED, value, HREARMAFTERRESTART); ham_action_handle_free(ahdl); ham_condition_handle_free(chdl); ehdl = ham_entity_handle(ND_LOCAL_NODE, "inetd", 0); chdl = ham_condition(ehdl, CONDRESTART, "restart", HREARMAFTERRESTART|HCONDINDEPENDENT); ahdl = ham_action_notify_pulse(chdl, "notifyrestart_imm", ND_LOCAL_NODE, pid, chid, PCODEINETDRESTART1, value, HREARMAFTERRESTART); ham_action_handle_free(ahdl); ahdl = ham_action_waitfor(chdl, "delay",NULL,6532, HREARMAFTERRESTART); ham_action_handle_free(ahdl); ahdl = ham_action_notify_pulse(chdl, "notifyrestart_delayed", ND_LOCAL_NODE, pid, chid, PCODEINETDRESTART2, value, HREARMAFTERRESTART); ham_action_handle_free(ahdl); ham_condition_handle_free(chdl); ham_entity_handle_free(ehdl); while (1) { rcvid = MsgReceivePulse( chid, &pulse, sizeof( pulse ), NULL ); if (rcvid < 0) { if (errno != EINTR) { exit(-1); } } else { switch (pulse.code) { case PCODEINETDDEATH: printf("Inetd Death Pulse\n"); break; case PCODENFSDELAYED: printf("Fs-nfs2 died: this is the possibly delayed pulse\n"); break; case PCODEINETDDETACH: printf("Inetd detached, so quitting\n"); goto the_end; case PCODEINETDRESTART1: printf("Inetd Restart Pulse: Immediate\n"); break; case PCODEINETDRESTART2: printf("Inetd Restart Pulse: Delayed\n"); break; } } } /* At this point we are no longer waiting for the information about inetd, since we know that it has exited. We will still continue to obtain information about the death of fs-nfs2, since we did not remove those actions if we exit now, the next time those actions are executed they will fail (notifications fail if the receiver does exist anymore), and they will automatically get removed and cleaned up. */ the_end: ham_disconnect(0); exit(0); }
In the above example a client registers for various different types of notifications relating to significant events concerning inetd and fs-nfs2. Notifications can be sent immediately or after a certain delay.
The notifications can also be received for each condition independently — for the entity's death (CONDDEATH), restart (CONDRESTART), and detaching (CONDDETACH).
The CONDRESTART is asserted by a HAM when an entity is successfully restarted.
Sometimes components become unavailable not because of the occurrence of a specific “bad” event, but because the components become unresponsive by getting stuck somewhere to the extent that the service they provide becomes effectively unavailable.
One example of this is when a process or a collection of processes/threads enters a state of deadlock or starvation, where none or only some of the involved processes can make any useful progress. Such situations are often difficult to pinpoint since they occur quite randomly.
You can have your clients assert “liveness” properties by actively sending heartbeats to a HAM. When a process deadlocks (or starves) and makes no progress, it will no longer heartbeat, and the HAM will automatically detect this condition and take corrective action.
The corrective action can range from simply terminating the offending application to restarting it and also delivering notifications about its state to other components that depend on the safe and correct functioning of this component. If necessary, a HAM can restart those other components as well.
We can demonstrate this condition by showing a simple process that has two threads that use mutual-exclusion locks incorrectly (by a design flaw), which causes them on occasion to enter a state of deadlock — each of the threads holds a resource that the other wants.
Essentially, each thread runs through a segment of code that involves the use of two mutexes.
Thread 1 Thread 2 ... ... while true while true do do obtain lock a obtain lock b (compute section1) (compute section1) obtain lock b obtain lock a (compute section2) (compute section2) release lock b release lock a release lock a release lock b done done ... ...
The code segments for each thread are shown below. The only difference between the two is the order in which the locks are obtained. The two threads deadlock upon execution, quite randomly; the exact moment of deadlock is related to the lengths of the “compute sections” of the two threads.
/* mutexdeadlock.c */ #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h> #include <signal.h> #include <pthread.h> #include <process.h> #include <sys/neutrino.h> #include <sys/procfs.h> #include <sys/procmgr.h> #include <ha/ham.h> pthread_mutex_t mutex_a = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_t mutex_b = PTHREAD_MUTEX_INITIALIZER; FILE *logfile; pthread_t threadID; int doheartbeat=0; #define COMPUTE_DELAY 100 void *func1(void *arg) { int id; /* obtain the two locks in the order a -> b perform some computation and then release the locks ... do this continuously */ id = pthread_self(); while (1) { delay(85); /* delay to let the other one go */ if (doheartbeat) ham_heartbeat(); pthread_mutex_lock(&mutex_a); fprintf(logfile, "Thread 1: Obtained lock a\n"); fprintf(logfile, "Thread 1: Waiting for lock b\n"); pthread_mutex_lock(&mutex_b); fprintf(logfile, "Thread 1: Obtained lock b\n"); fprintf(logfile, "Thread 1: Performing computation\n"); delay(rand()%COMPUTE_DELAY+5); /* delay for computation */ fprintf(logfile, "Thread 1: Unlocking lock b\n"); pthread_mutex_unlock(&mutex_b); fprintf(logfile, "Thread 1: Unlocking lock a\n"); pthread_mutex_unlock(&mutex_a); } return(NULL); } void *func2(void *arg) { int id; /* obtain the two locks in the order b -> a perform some computation and then release the locks ... do this continuously */ id = pthread_self(); while (1) { delay(25); if (doheartbeat) ham_heartbeat(); pthread_mutex_lock(&mutex_b); fprintf(logfile, "\tThread 2: Obtained lock b\n"); fprintf(logfile, "\tThread 2: Waiting for lock a\n"); pthread_mutex_lock(&mutex_a); fprintf(logfile, "\tThread 2: Obtained lock a\n"); fprintf(logfile, "\tThread 2: Performing computation\n"); delay(rand()%COMPUTE_DELAY+5); /* delay for computation */ fprintf(logfile, "\tThread 2: Unlocking lock a\n"); pthread_mutex_unlock(&mutex_a); fprintf(logfile, "\tThread 2: Unlocking lock b\n"); pthread_mutex_unlock(&mutex_b); } return(NULL); } int main(int argc, char *argv[]) { pthread_attr_t attrib; struct sched_param param; ham_entity_t *ehdl; ham_condition_t *chdl; ham_action_t *ahdl; int i=0; char c; logfile = stderr; while ((c = getopt(argc, argv, "f:l" )) != -1 ) { switch(c) { case 'f': /* log file */ logfile = fopen(optarg, "w"); break; case 'l': /* do liveness heartbeating */ if (access("/proc/ham",F_OK) == 0) doheartbeat=1; break; } } setbuf(logfile, NULL); srand(time(NULL)); fprintf(logfile, "Creating separate competing compute thread\n"); pthread_attr_init (&attrib); pthread_attr_setinheritsched (&attrib, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy (&attrib, SCHED_RR); param.sched_priority = getprio (0); pthread_attr_setschedparam (&attrib, ¶m); if (doheartbeat) { /* attach to ham */ ehdl = ham_attach_self("mutex-deadlock",1000000000UL,5 ,5, 0); chdl = ham_condition(ehdl, CONDHBEATMISSEDHIGH, "heartbeat-missed-high", 0); ahdl = ham_action_execute(chdl, "terminate", "/proc/boot/mutex-deadlock-heartbeat.sh", 0); } /* create competitor thread */ pthread_create (&threadID, &attrib, func1, NULL); pthread_detach(threadID); func2(NULL); exit(0); }
Upon execution, what we see is:
The threads will execute as described earlier, but will eventually deadlock. We'll wait for a reasonable amount of time (a few seconds) until they do end in deadlock. The threads write out a simple execution log into /dev/shmem/mutex-deadlock.log.
Here's the current state of the threads in process 73746:
pid tid name prio STATE Blocked 73746 1 oot/mutex-deadlock 10r MUTEX 73746-02 #-21474 73746 2 oot/mutex-deadlock 10r MUTEX 73746-01 #-21474
And here's the tail from the threads' log file:
Thread 2: Obtained lock b Thread 2: Waiting for lock a Thread 2: Obtained lock a Thread 2: Performing computation Thread 2: Unlocking lock a Thread 2: Unlocking lock b Thread 2: Obtained lock b Thread 2: Waiting for lock a Thread 1: Obtained lock a Thread 1: Waiting for lock b
/tmp/mutex-deadlock.core: processor=PPC num_cpus=2 cpu 1 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cpu 2 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cyc/sec=16666666 tod_adj=999522656000000000 nsec=5190771360840 inc=999960 boot=999522656 epoch=1970 intr=-2147483648 rate=600000024 scale=-16 load=16666 MACHINE="mtx604-smp" HOSTNAME="localhost" hwflags=0x000004 pretend_cpu=0 init_msr=36866 pid=73746 parent=49169 child=0 pgrp=73746 sid=1 flags=0x000300 umask=0 base_addr=0x48040000 init_stack=0x4803fa20 ruid=0 euid=0 suid=0 rgid=0 egid=0 sgid=0 ign=0000000006801000 queue=ff00000000000000 pending=0000000000000000 fds=4 threads=2 timers=0 chans=1 thread 1 REQUESTED ip=0xfe32f838 sp=0x4803f920 stkbase=0x47fbf000 stksize=528384 state=MUTEX flags=0 last_cpu=1 timeout=00000000 pri=10 realpri=10 policy=RR thread 2 ip=0xfe32f838 sp=0x47fbef80 stkbase=0x47f9e000 stksize=135168 state=MUTEX flags=4020000 last_cpu=2 timeout=00000000 pri=10 realpri=10 policy=RR
The processes are deadlocked, with each process holding one lock and waiting for the other.
Now consider the case where the client can be made to heartbeat so that a HAM will automatically detect when it's unresponsive and will terminate it.
Thread 1 Thread 2 ... ... while true while true do do obtain lock a obtain lock b (compute section1) (compute section1) obtain lock b obtain lock a send heartbeat send heartbeat (compute section2) (compute section2) release lock b release lock a release lock a release lock b done done ... ...
Here the process is expected to send heartbeats to a HAM. By placing the heartbeat call within the inside loop, the deadlock condition is trapped. The HAM notices that the heartbeats have stopped and can then perform recovery.
Let's look at what happens now:
The threads will execute as described earlier, but will eventually deadlock. We'll wait for a reasonable amount of time (a few seconds) until they do end in deadlock. The threads write out a simple execution log into /dev/shmem/mutex-deadlock-heartbeat.log. The HAM detects that the threads have stopped heartbeating and terminates the process, after saving its state for postmortem analysis.
Here's the current state of the threads in process 462866 and the state of mutex-deadlock when it missed heartbeats:
pid tid name prio STATE Blocked 462866 1 oot/mutex-deadlock 10r MUTEX 462866-03 #-2147 462866 2 oot/mutex-deadlock 63r RECEIVE 1 462866 3 oot/mutex-deadlock 10r MUTEX 462866-01 #-2147 Entity state from HAM Path : mutex-deadlock Entity Pid : 462866 Num conditions : 1 Condition type : ATTACHEDSELF Stats: HeartBeat Period: 1000000000 HB Low Mark : 5 HB High Mark : 5 Last Heartbeat : 2001/09/03 14:40:41:406575120 HeartBeat State : MISSEDHIGH Created : 2001/09/03 14:40:40:391615720 Num Restarts : 0
And here's the tail from the threads' log file:
Thread 2: Obtained lock b Thread 2: Waiting for lock a Thread 2: Obtained lock a Thread 2: Performing computation Thread 2: Unlocking lock a Thread 2: Unlocking lock b Thread 2: Obtained lock b Thread 2: Waiting for lock a Thread 1: Obtained lock a Thread 1: Waiting for lock b
/tmp/mutex-deadlock.core: processor=PPC num_cpus=2 cpu 1 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cpu 2 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cyc/sec=16666666 tod_adj=999522656000000000 nsec=5390696363520 inc=999960 boot=999522656 epoch=1970 intr=-2147483648 rate=600000024 scale=-16 load=16666 MACHINE="mtx604-smp" HOSTNAME="localhost" hwflags=0x000004 pretend_cpu=0 init_msr=36866 pid=462866 parent=434193 child=0 pgrp=462866 sid=1 flags=0x000300 umask=0 base_addr=0x48040000 init_stack=0x4803f9f0 ruid=0 euid=0 suid=0 rgid=0 egid=0 sgid=0 ign=0000000006801000 queue=ff00000000000000 pending=0000000000000000 fds=5 threads=3 timers=1 chans=4 thread 1 REQUESTED ip=0xfe32f838 sp=0x4803f8f0 stkbase=0x47fbf000 stksize=528384 state=MUTEX flags=0 last_cpu=2 timeout=00000000 pri=10 realpri=10 policy=RR thread 2 ip=0xfe32f1a8 sp=0x47fbef50 stkbase=0x47f9e000 stksize=135168 state=RECEIVE flags=4000000 last_cpu=2 timeout=00000000 pri=63 realpri=63 policy=RR blocked_chid=1 thread 3 ip=0xfe32f838 sp=0x47f9df80 stkbase=0x47f7d000 stksize=135168 state=MUTEX flags=4020000 last_cpu=1 timeout=00000000 pri=10 realpri=10 policy=RR
We can demonstrate this condition by showing a simple process containing two threads that use mutual exclusion locks to manage a critical section. Thread 1 runs at a high priority, while Thread 2 runs at a lower priority. Essentially, each thread runs through a segment of code that looks like this:
Thread1 Thread 2 ... ... (Run at high priority) (Run at low priority) while true while true do do obtain lock a obtain lock a (compute section1) (compute section1) release lock a release lock a done done ... ...
The code segments for each thread is shown below; the only difference being the priorities of the two threads. Upon execution, Thread 2 eventually starves.
/* mutexstarvation.c */ #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h> #include <signal.h> #include <pthread.h> #include <process.h> #include <sys/neutrino.h> #include <sys/procfs.h> #include <sys/procmgr.h> #include <ha/ham.h> pthread_mutex_t mutex_a = PTHREAD_MUTEX_INITIALIZER; FILE *logfile; int doheartbeat=0; #define COMPUTE_DELAY 900 void *func1(void *arg) { int id; id = pthread_self(); while (1) { pthread_mutex_lock(&mutex_a); fprintf(logfile, "Thread 1: Locking lock a\n"); delay(rand()%COMPUTE_DELAY+50); /* delay for computation */ fprintf(logfile, "Thread 1: Unlocking lock a\n"); pthread_mutex_unlock(&mutex_a); } return(NULL); } void *func2(void *arg) { int id; id = pthread_self(); while (1) { pthread_mutex_lock(&mutex_a); fprintf(logfile, "\tThread 2: Locking lock a\n"); if (doheartbeat) ham_heartbeat(); delay(rand()%COMPUTE_DELAY+50); /* delay for computation */ fprintf(logfile, "\tThread 2: Unlocking lock a\n"); pthread_mutex_unlock(&mutex_a); } return(NULL); } int main(int argc, char *argv[]) { pthread_attr_t attrib; struct sched_param param; ham_entity_t *ehdl; ham_condition_t *chdl; ham_action_t *ahdl; int i=0; char c; pthread_attr_t attrib2; struct sched_param param2; pthread_t threadID; pthread_t threadID2; logfile = stderr; while ((c = getopt(argc, argv, "f:l" )) != -1 ) { switch(c) { case 'f': /* log file */ logfile = fopen(optarg, "w"); break; case 'l': /* do liveness heartbeating */ if (access("/proc/ham",F_OK) == 0) doheartbeat=1; break; } } setbuf(logfile, NULL); srand(time(NULL)); fprintf(logfile, "Creating separate competing compute thread\n"); if (doheartbeat) { /* attach to ham */ ehdl = ham_attach_self("mutex-starvation",1000000000UL, 5, 5, 0); chdl = ham_condition(ehdl, CONDHBEATMISSEDHIGH, "heartbeat-missed-high", 0); ahdl = ham_action_execute(chdl, "terminate", "/proc/boot/mutex-starvation-heartbeat.sh", 0); } /* create competitor thread */ pthread_attr_init (&attrib2); pthread_attr_setinheritsched (&attrib2, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy (&attrib2, SCHED_RR); param2.sched_priority = sched_get_priority_min(SCHED_RR); pthread_attr_setschedparam (&attrib2, ¶m2); pthread_create (&threadID2, &attrib2, func2, NULL); delay(3000); /* let the other thread go on for a while... */ pthread_attr_init (&attrib); pthread_attr_setinheritsched (&attrib, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy (&attrib, SCHED_RR); param.sched_priority = sched_get_priority_max(SCHED_RR); pthread_attr_setschedparam (&attrib, ¶m); pthread_create (&threadID, &attrib, func1, NULL); pthread_join(threadID, NULL); pthread_join(threadID2, NULL); exit(0); }
Upon execution, here's what we see:
The threads will execute as described earlier, but eventually Thread 2 will starve. We'll wait for a reasonable amount of time (some seconds) until Thread 2 ends up starving. The threads write out a simple execution log into /dev/shmem/mutex-starvation.log.
Here's the current state of the threads in process 622610:
pid tid name prio STATE Blocked 622610 1 t/mutex-starvation 10r JOIN 3 622610 2 t/mutex-starvation 1r MUTEX 622610-03 #-2147 622610 3 t/mutex-starvation 63r NANOSLEEP
And here's the tail from the threads' log file:
Thread 1: Unlocking lock a Thread 1: Locking lock a Thread 1: Unlocking lock a Thread 1: Locking lock a Thread 1: Unlocking lock a Thread 1: Locking lock a Thread 1: Unlocking lock a Thread 1: Locking lock a Thread 1: Unlocking lock a Thread 1: Locking lock a
/tmp/mutex-starvation.core: processor=PPC num_cpus=2 cpu 1 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cpu 2 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cyc/sec=16666666 tod_adj=999522656000000000 nsec=5561011550640 inc=999960 boot=999522656 epoch=1970 intr=-2147483648 rate=600000024 scale=-16 load=16666 MACHINE="mtx604-smp" HOSTNAME="localhost" hwflags=0x000004 pretend_cpu=0 init_msr=36866 pid=622610 parent=598033 child=0 pgrp=622610 sid=1 flags=0x000300 umask=0 base_addr=0x48040000 init_stack=0x4803fa10 ruid=0 euid=0 suid=0 rgid=0 egid=0 sgid=0 ign=0000000006801000 queue=ff00000000000000 pending=0000000000000000 fds=4 threads=3 timers=0 chans=1 thread 1 REQUESTED ip=0xfe32f8c8 sp=0x4803f8a0 stkbase=0x47fbf000 stksize=528384 state=JOIN flags=0 last_cpu=1 timeout=00000000 pri=10 realpri=10 policy=RR thread 2 ip=0xfe32f838 sp=0x47fbef80 stkbase=0x47f9e000 stksize=135168 state=MUTEX flags=4000000 last_cpu=2 timeout=00000000 pri=1 realpri=1 policy=RR thread 3 ip=0xfe32f9a0 sp=0x47f9df20 stkbase=0x47f7d000 stksize=135168 state=NANOSLEEP flags=4000000 last_cpu=2 timeout=0x1001000 pri=63 realpri=63 policy=RR
Now consider the case where Thread 2 is made to heartbeat. A HAM will automatically detect when the thread is unresponsive and can terminate it and/or perform recovery.
Thread 1 Thread 2 ... ... (Run at high priority) (Run at low priority) while true while true do do obtain lock a obtain lock a send heartbeat (compute section1) (compute section1) release lock a release lock a done done ... ...
Here Thread 2 is expected to send heartbeats to a HAM. By placing the heartbeat call within the inside loop, the HAM detects when Thread 2 begins to starve.
The threads will execute as described earlier, but eventually Thread 2 will starve. We'll wait for a reasonable amount of time (some seconds) until it does. The threads write out a simple execution log into /dev/shmem/mutex-starvation-heartbeat.log. The HAM detects that the thread has stopped heartbeating and terminates the process, after saving its state for postmortem analysis.
Let's look at what happens:
Here's the current state of the threads in process 753682 and the state of mutex-starvation when it missed heartbeats:
pid tid name prio STATE Blocked 753682 1 t/mutex-starvation 10r JOIN 4 753682 2 t/mutex-starvation 63r RECEIVE 1 753682 3 t/mutex-starvation 1r MUTEX 753682-04 #-2147 753682 4 t/mutex-starvation 63r NANOSLEEP Entity state from HAM Path : mutex-starvation Entity Pid : 753682 Num conditions : 1 Condition type : ATTACHEDSELF Stats: HeartBeat Period: 1000000000 HB Low Mark : 5 HB High Mark : 5 Last Heartbeat : 2001/09/03 14:44:37:796119160 HeartBeat State : MISSEDHIGH Created : 2001/09/03 14:44:34:780239800 Num Restarts : 0
And here's the tail from the threads' log file:
Thread 1: Unlocking lock a Thread 1: Locking lock a Thread 1: Unlocking lock a Thread 1: Locking lock a Thread 1: Unlocking lock a Thread 1: Locking lock a Thread 1: Unlocking lock a Thread 1: Locking lock a Thread 1: Unlocking lock a Thread 1: Locking lock a
/tmp/mutex-starvation.core: processor=PPC num_cpus=2 cpu 1 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cpu 2 cpu=602370 name=604e speed=299 flags=0xc0000001 FPU MMU EAR cyc/sec=16666666 tod_adj=999522656000000000 nsec=5627098907040 inc=999960 boot=999522656 epoch=1970 intr=-2147483648 rate=600000024 scale=-16 load=16666 MACHINE="mtx604-smp" HOSTNAME="localhost" hwflags=0x000004 pretend_cpu=0 init_msr=36866 pid=753682 parent=729105 child=0 pgrp=753682 sid=1 flags=0x000300 umask=0 base_addr=0x48040000 init_stack=0x4803f9f0 ruid=0 euid=0 suid=0 rgid=0 egid=0 sgid=0 ign=0000000006801000 queue=ff00000000000000 pending=0000000000000000 fds=5 threads=4 timers=1 chans=4 thread 1 REQUESTED ip=0xfe32f8c8 sp=0x4803f880 stkbase=0x47fbf000 stksize=528384 state=JOIN flags=0 last_cpu=2 timeout=00000000 pri=10 realpri=10 policy=RR thread 2 ip=0xfe32f1a8 sp=0x47fbef50 stkbase=0x47f9e000 stksize=135168 state=RECEIVE flags=4000000 last_cpu=2 timeout=00000000 pri=63 realpri=63 policy=RR blocked_chid=1 thread 3 ip=0xfe32f838 sp=0x47f9df80 stkbase=0x47f7d000 stksize=135168 state=MUTEX flags=4000000 last_cpu=2 timeout=00000000 pri=1 realpri=1 policy=RR thread 4 ip=0xfe32f9a0 sp=0x47f7cf20 stkbase=0x47f5c000 stksize=135168 state=NANOSLEEP flags=4000000 last_cpu=1 timeout=0x1001000 pri=63 realpri=63 policy=RR