Linux Drive PC Code
Currently the drive PC runs on old computer hardware under the PSOS operating system. This hardware would be difficult to replace if it broke, and can not be easily upgraded to improve performance. The aim of porting the drive PC code to Linux is to allow the use of more general computer hardware that can be replaced or upgraded more easily in the future if required.
The primary constraint with the porting of the drive PC code is that it must behave exactly as the old code
does from the point of view of a client connecting to it. That is, it is undesirable to need to reprogram each
client program (such as vdesk, bruce, the FS antenna control etc.) to make it usable with the new
system.
The secondary constraint is that we want the drive PC to be completely free of non-standard hardware that would be required to control the telescope. In fact, we would like the drive PC to be "in contact" with as little telescope hardware as possible. Instead, we would prefer to rely on some other less accessible piece of standalone hardware, whose internal processes are very simple, to communicate with the hardware based on what the drive PC instructs it to do, and to communicate back to the drive PC what state the hardware is in.
Main differences between old and new drive PC codes
- Where the old drive PC required a 
roottask to manage the startup process, the new drive PC uses the standard Linux startup scripts, and in fact has a standard Linux base. Currently, the new drive PC is based on Arch Linux. - The old drive PC uses semaphores to communicate between processes, while the new drive PC uses shared memory.
 - The old drive PC has a network socket listener that is always running waiting for a connection to a client process. The new drive PC uses 
xinetdto spawn monitor and control processes that then no longer have to deal explicitly with network communication. - The old drive PC could only manage a few monitoring connections at a time. This in part is because every client required the 
hartbeatprocess to service the request, andhartbeatwas required to run completely within the 10 milliseconds between events. Because the workspace and system parameter structures are stored in shared memory in the new drive PC, it is possible to delegate the servicing of monitoring connections to theant_monprocesses. This should also allow the PC to accept far more monitoring connections. 
Event Generator
The old drive PCs had a hardware event generator that generated an interrupt some number of times per second. This kept the cadence of the processing regular and allowed the drive PC to keep quite accurate time. The drive PC's event generator was set to generate interrupts 100 times per second.
The new drive PCs will have no trouble keeping time accurately, as NTP will do that for us. We still want a
regular cadence for the drive PC functions though, and it would be preferable to keep the 100 Hz cycle time.
So the new drive PC has a software event generator, which is called event_generator. This program sets up an
internal alarm using the struct itimerval timing structure and the setitimer alarm function in GNU C.
This function takes the itimerval structure, which has two elements, and asks for a SIGALRM signal to be
given to it in a time it_val (which can be specified in microseconds). After the SIGALRM signal is
generated, the timer gets set automatically to the time it_interval, which for us is set to 1/100 = 10,000
microseconds. Thus the cadence is always kept constant, as no code is required for our routine to reset the
timer.
Shared memory
Before all this however, the event generator also sets up a small area of shared memory. The shared memory segment
has a key of 100, and is 8 integer values in size (or 32 bytes). Element 0 of this segment is used by other
processes that want to get events from the event generator. These other processes should attach to the shared
memory, and check that element 0 is set to 0. If it is, then they may insert their PID into this shared memory
element, and the event generator will then add this PID to its list of processes that it sends events to. The
"event" is actually a SIGALRM signal.
Element 1 is set when event_generator starts to be the frequency in Hz of the events that the event generator
will supply. Although no effort is made to ensure this value is read-only, it should be treated as such by other
processes.
Element 2 stores the PID of the perform process, which is described later.
Element 3 stores the PID of the hartbeat process, which is described later.
Element 4 is set when event_generator starts to be the DUT1 correction, and as for element 1, it should be
considered read-only.
Element 5 stores the PID of the sysjob process, which is described later.
Element 6 is used by the hartbeat process to communicate to sysjob what error condition has occurred.
Element 7 is set by sysjob to be 1 when it is taking emergency action, and should be set to 0 at all other
times.
Generating events
Every time the timer produces a SIGALRM signal, the event generator looks at shared memory element 0 to see
if another process has asked to be included, and adds its PID to the list if one has.
After this, the event generator goes through the list and sends a SIGALRM signal to each process it knows
about. It should be noted therefore that although the event generator will always be triggered at a frequency
of 100 Hz, the time between successive alarm signals at the other processes is not guaranteed to be 10 milliseconds.
If the alarm signal can not be delivered, because the process no longer exists, or the event generator does not
have the proper authority to send signals to it, then the event generator will remove the PID from its list with
no warning.
Heartbeat
The heartbeat process is run in a program called hartbeat, a leftover from the PSOS days when programs could only
have 8 character file names. The terms heartbeat and hartbeat can and will be used interchangeably in this
document.
The heartbeat process is responsible for:
- calculating where the telescope should be at any particular moment, including applying pointing corrections
 - determining the rate the drives should be moving at
 - checking to see that no problems have occurred in the telescope's systems and that observing conditions are not dangerous
 - ensuring that the telescope does not move past its operational limits
 
The hartbeat task starts up first after the event generator and sets up four areas of shared memory. The observatory
code uses two structures to keep a track of what state the telescope is in (struct wrkspace) and what the telescope is capable of
(struct syspar). It was decided in the Linux version of the drive PC code that these two structures should be
kept in shared memory so that other processes could learn what state the telescope was in without having to directly
communicate with hartbeat. This was mainly driven by a desire to make monitoring and controlling the telescope
less complicated that it was with the PSOS drive PC; this will be discussed in more detail later.
The wrkspace structure is given the shared memory key 101, while the syspar structure has key 102.
The Linux drive PC also uses shared memory to hold the drive action semaphore and for completion messaging, with key
103. This is a 6 int sized memory area, where elements 0, 1 and 2 are for message passing, element 4 holds the
PID of the process waiting for the message and element 5 is the semaphore address.
The fourth shared memory segment (with key NEXTJOB_SHM_KEY) is required now that different tasks can all access the wrkspace structure in
shared memory. This will be discussed in detail in the monitoring section below.
Upon starting, hartbeat reads in the configuration file /obs/linux/cfg/host.cfg, which should contain all the
information about what the telescope is capable of in terms of limits, drive type (XY or AzEl), speeds etc. The use
of this config file makes it easy to use the exact same drive PC code for many telescopes. At this point, the pointing
correction files (specified in host.cfg are read in as well. The hardware state is initialised and then queried to
see what state the antenna is in, and the time is set. As an interesting note, hartbeat counts every time it gets
an event from the event generator, which should be 100 times per second. It is thus possible to determine how long the
hartbeat task has been running by looking at this number (hartbeat_count), which should increment 8,640,000 times in one day.
The signal SIGALRM is set up to trigger the hartbeat routine, and then the process waits for its events to
arrive.
The hartbeat routine does the following:
- queries the hardware (not yet implemented 2008/12/05 JBS)
 - increments the 
hartbeat_count - gets the time from the PCs internal clock to millisecond accuracy
 - updates the 
wrkspacestructure with the current hardware state (drives, focus, limits, panic buttons, temperatures etc.) - calculates the ephemeris and sidereal times
 - update weather information
 - read the encoders, and determine which wrap to be in (for AzEl antenna)
 - check encoder values for consistency to detect glitches, and call 
sysjobif a glitch is detected - check the drives for problems, and call 
sysjobif problems are detected - check the wind speed to see if we should be wind stowed, and call 
sysjobif a wind stow is warranted - apply pointing corrections
 - convert telescope native coordinates to all other coordinate systems
 - call the 
heartbeat_drive_actionroutine to control the drives (will describe this further below) - apply acceleration limits
 - calculate what we should be sending to the drives and ensure it is valid
 - tell the drives to move at the calculated rate
 - update the focus platform
 - generate the flashing lights on the debug box
 - signal 
performthat we have completed our tasks in time 
The heartbeat_drive_action routine (in source file driver.c) does the following:
- check if someone else has control of the drives using the drive action semaphore shared memory location, and exit immediately if this is the case
 - calculate the stopping distance in native coordinates
 - update the target position and rate depending on what the telescope is supposed to be doing (ie. tracking, slewing, scanning etc.)
 - calculate the velocity of the target position in native coordinates, and check that the target position and stopping positions are inside the telescope limits, aborting the operation if they are not
 - if one action is complete, move on to the action required afterwards (eg. after a slew, we must begin tracking)
 
The routines that interact with the hardware have not yet been written for the new drive PC, and the hardware itself is not yet finalised. The current plan is for the drive PC to communicate with a Rabbit PIC and an Allan-Bradley PLC. The Rabbit will use its serial ports to control one each of the azimuth/X and elevation/Y drives, and to read the serial position encoders. This will require at least 4 serial ports. The Rabbit will receive the rate that the drives should be running at from the drive PC, and it will pass this on to the drives in the appropriate fashion. It will also poll the encoders and immediately pass on the values to the drive PC.
The PLC will manage the activity of the slower systems, such as the limit switches, the panic buttons, weather and wind information. This is because the PLC runs at a much slower cadence than the Rabbit and drive PC do, and these states change less often. The PLC will send data to the drive PC, but the drive PC should not have to send much, if any, data to the PLC.
System Job
The system job process, or sysjob, is responsible for taking control of the antenna in an emergency situation
and moving it to a safe location.
After sysjob task starts, it attaches to the event generator's shared memory and requests that it receive
alarm signals from it. From then on, it wakes up at 100 Hz and checks shared memory (element 6) for an emergency
condition. This condition is set by hartbeat when the wind gets too high (ANTENNA_WIND_STOW), after an
encoder fault (ANTENNA_ENCODER_PROBLEM) or a drives fault (ANTENNA_DRIVES_PROBLEM). If there is no emergency
condition, sysjob goes back to sleep waiting for another alarm signal from the event generator. If there is an
emergency condition, sysjob sets element 7 of shared memory to 1 to signal hartbeat that it is handling the
emergency.
If there has been an encoder or drives fault, then sysjob issues an abort command and waits for the antenna to
stop moving, after which it commands the drives to turn off. If the wind is too high, then sysjob commands the
antenna to park. Once the antenna is in a safe condition (off or parked), sysjob sets element 7 of shared
memory to 0 to signal hartbeat that it is finished handling the emergency.
Antenna Monitor and Control
The majority of software differences between PSOS and Linux versions of the drive PC are in the monitoring and control code.
Under PSOS, the drive PC code started its own socket server and listened for clients, while under Linux we have opted
to use xinetd to listen for clients on port 30384. When a client connects on this port on the drive PC, xinetd
starts an ant_mon process, which then communicates with the network socket as if it were reading and writing to
STDIN and STDOUT respectively.
The ant_mon process first checks whether the client wants to control the telescope or get monitoring information,
and switches to either the routine ant_ctrl or ant_mon respectively.
But whereas in the PSOS code the hartbeat routine spent time each loop "servicing" each monitoring request
individually, the Linux code leaves this job to the ant_mon routine, which can now access the telescope state
data through the wrkspace structure in shared memory. Due to the relatively slow speed at which the PSOS
machines were running, it was decided that a maximum of 5 clients could get their monitoring requests serviced at
any particular time. With the Linux code, since the computer that will run the drive PC will be a great deal
faster, and since the ant_mon processes will each run in their own thread, it should be possible to greatly
increase the number of maximum clients, and perhaps even unlimit it.
Monitoring
When the ant_mon routine is called it checks to see what the client is asking for, which should be one of:
ANT_SETUPLIST: the client wants to set up a new list, so it sends some info about which lists it wants
(from the wrkspace or syspar lists, or a combination)
ANT_STARTMON: the client wants the drive PC to start collecting the list dataANT_REQUESTLIST: the client wants to collect the data collected by the drive PCANT_RESETLIST: stop collecting dataANT_KILLLIST: the client doesn't care about the list any moreANT_GETDIAL: the client wants to know the native dial coordinates of the given demanded position
Once the antenna monitor routine has received a request to begin collecting list data, it asks to begin
receiving alarm signals from the event generator so it can begin running at the same cadence as the
hartbeat task. This allows clients to get data at up to 100 Hz.
Control
When the ant_ctrl task is called it first checks where the user is connecting from. If the user is connecting
from the local host (IP 127.0.0.1), then it is the SYSTEM_USER and can override any other telescope user.
This is the case for the SYSJOB process, so it can always turn the telescope off or park it in case of an
emergency.
All other users get REMOTE_USER privileges. This means, so long as no other user with greater control is using
the telescope, it can give commands to the drive system.
The ant_ctrl routine connects to the client and listens for commands. When it gets one, it collects the data
sent to it and then calls the driver routine to start the telescope going on what the user has asked it to do.
After this, the hartbeat routine takes care of moving the telescope to its destination.
The control program needs to signal its controlling client when the telescope has completed its task. When the
hartbeat routine was in charge of all the control, and the ant_ctrl routine merely signalled hartbeat
to move the telescope, this task was easy. It is not much more difficult with the Linux drive PC, using some
shared memory in the msg variable. This variable uses element 5 (msg[4]) to pass the PID of the ant_mon
process controlling the telescope to the driver routine which stores it as gw->qid, which used to be the
queue ID from ant_ctrl. After storing the PID, driver blanks msg[4] and ant_ctrl waits until it sees
that msg[4] has been set to the negative of its PID, which is what driver does when the antenna has completed
its task. At this point ant_ctrl sends back the antenna completion code to its client.
The only special case is the ABORT command, which causes the ant_ctrl task that was controlling the telescope
to immediately stop waiting and send back an aborted completion code, and causes the ant_ctrl task giving the abort
command to wait until the telescope has reached an idle state.
Performance Monitoring
Message Logging
The drive PC is usually a silent beast that has no visible on-screen output as all its tasks are run in the background
as services. Then there are the ant_mon and ant_ctrl routines that communicate to their clients using STDIN
and STDOUT via xinetd, so having messages output from this program can cause the network communication to break
down if care is not taken.
Each routine is therefore able to access the system log through some helper macros defined in /obs/generic/sysmsg/root_msg.h.
These macros call a routine that outputs to the system log. For the Linux drive PC, this log is found in
/var/log/everything.log.
Organisation and Compilation
The new Linux drive PC code is part of the /obs directory structure, as many of the libraries in it are required
by the drive PC. The new code is in /obs/linux and consists of the source files ant_ctrl.c, ant_mon.c,
coords.c, display.c, driver.c, drives.c, encoder.c, event_generator.c, focus.c, hardware.c,
hartbeat.c, heartmon.c, init_ant.c, newday.c, nextjob.c, packmon.c, perform.c, pointing.c,
sys_cfg.c and sysjob.c. It also requires the header files ant_cmd.h, coords.h, driver.h, drives.h,
focus.h, newday.h, nextjob.h, packmon.h, pointing.h and sys_cfg.h.
The Makefile compiles event_generator, perform, sysjob, hartbeat and ant_mon.
During normal compilation with the flag -Wall (report all warnings) many warning messages were emitted, mostly about
defined variables not being used. In order to prevent emission of these warnings, to make it clearer if a compilation error
does occur, each source file with warnings was given a routine called nowarnings_hartbeat (for example) which used
these variables and thus stopped the warnings. These routines are not called from any of the code however.
The drive PC
Computer specifications
The computer running the drive PC does not need to be anything special, and indeed it is not. The computer that will be used for the first Linux drive PC has:
- Motherboard: Gigabyte GA-EG31M-S2
 - Processor: Intel Celeron Dual Core E1400
 - RAM: 1 GB Kingston - KVR800D2N5/1G
 - HDD: 80 GB Western Digital WD800AAJS
 - Optical: Lite-On - DH-16D3P
 - PSU: 380W Antec - EA-380
 - Case: Antec - NSK4000
 
This computer costs only ~$650 including GST. The main benefit that this computer has over the old computers running PSOS is the hard drives, which should allow the drive PC to boot up cold in only a few tens of seconds (a virtual PC used for developing the drive PC software was able to boot up into the system in 34 seconds).
Linux
The drive PC uses Arch Linux, which is a lightweight distribution that uses few resources but is also highly configurable and is easy to maintain.
Startup sequence
The drive PC services need to be started in a particular order. The startup scripts for the services are in /etc/rc.d,
and the order that they are started up is configured on the last line of /etc/rc.conf.
The startup scripts, and their order are given below:
DRIVEPC_1_event_generator: this needs to be started first to establish the shared memory segments, and to start generating the alarm signals that the rest of the system needs to run properly.DRIVEPC_2_hartbeat: the heartbeat is started up next to take control of the telescope's systems.xinetd: needs to be started to listen for incoming client connections; it will also startant_monprocesses as requiredDRIVEPC_3_sysjob: start the emergency control client after the client listener is started, and this should usually be the first client to connect, although it doesn't need to beDRIVEPC_4_perform: start the performance monitor last to ensure everything runs reliably
The startup scripts can be called manually as per normal Debian-type startup scripts, ie. with start, stop, or restart as their argument.
