Title: sendrec() / send() / receive()


Due to the microkernel design of Minix, user processes request operations from the network service through messages. However, user processes never send messages directly to the network service. Instead, user processes send messages to the file system, which then relays the messages to the network service. In the same manner, the network service never sends messages directly to the user processes. The network service sends messages to the file system, which then relays the messages to the user processes.

Below is a more detailed description of how messages are passed.

During the initialization of the kernel, the idt[] (Interrupt Descriptor Table) table is configured by int_gate(), which is called by prot_init() (both functions are found in src/kernel/protect.c).

When the kernel, a service, or a user process makes a system call, they are sending the file system a message by indirectly calling sendrec(). For example, if a process opens a file descriptor (from _open.c):

PUBLIC int open(name, flags)

_CONST char *name;
int flags;
#endif
{
va_list argp;
message m;

va_start(argp, flags);
if (flags & O_CREAT) {
       m.m1_i1 = strlen(name) + 1;
       m.m1_i2 = flags;
       m.m1_i3 = va_arg(argp, Mode_t);
       m.m1_p1 = (char *) name;
} else {
       _loadname(name, &m);
       m.m3_i2 = flags;
}
va_end(argp);
return (_syscall(FS, OPEN, &m));
}
open() builds a message and then passes it into _syscall() (defined in syscall.c):

PUBLIC int _syscall(who, syscallnr, msgptr)

int who;
int syscallnr;
register message *msgptr;
{
int status;

msgptr->m_type = syscallnr;
status = _sendrec(who, msgptr);
if (status != 0) {
       /* 'sendrec' itself failed. */
       /* XXX - strerror doesn't know all the codes */
       msgptr->m_type = status;
}
if (msgptr->m_type < 0) {
       errno = -msgptr->m_type;
       return(-1);
}
return(msgptr->m_type);
}
As seen above, _syscall() calls _sendrec() with a pointer to the message as the second argument.

When an interrupt (the assembly language int instruction) is called, as _sendrec() (defined in src/lib/i386/rts/_sendrec.s) does:

__sendrec:

push ebp
mov ebp, esp
push ebx
mov eax, SRCDEST(ebp) ! eax = dest-src
mov ebx, MESSAGE(ebp) ! ebx = message pointer
mov ecx, BOTH ! _sendrec(srcdest, ptr)
int SYSVEC ! trap to the kernel
pop ebx
pop ebp
ret
the function at the element of gate_table[] corresponding to the interrupt number is called. For sendrec(), this element is SYSVEC (for protected mode, SYSVEC is #define'd as 33), the 33rd element in the interrupt descriptor table:


gate_table[] = {
...
       { s_call, SYS386_VECTOR, USER_PRIVILEGE },       /* 386 system call */
...
};
(Note that SYS386_VECTOR is also #define'd as 33. The assembly language routine _s_call is defined in src/kernel/mpx386.s.)

Immediately before the interrupt call in _sendrec(), the stack is as follows:

+--------------+

| ARG2 | message pointer
+--------------+
| ARG1 | who
+--------------+
|return address|
+--------------+
| ebp | <====== ebp
+--------------+
| ebx | <====== esp
+--------------+

Note also that eax contains the destination (if _receive() was called, eax would contain the source), ebx contains the address of the message, and ecx indicates whether the operation is a send, receive, or both.

Essentially, _s_call saves the state of some registers (so that they can be restored later) and then calls sys_call() after setting up sys_call()'s arguments (which are held in eax, ebx, and ecx). After sys_call() returns, the state of the registers is restored.

!*===========================================================================*

!* _s_call *
!*===========================================================================*
.align 16
_s_call:
_p_s_call:
cld ! set direction flag to a known value
sub esp, 6*4 ! skip RETADR, eax, ecx, edx, ebx, est
push ebp ! stack already points into proc table
push esi
push edi
o16 push ds
o16 push es
o16 push fs
o16 push gs
mov dx, ss
mov ds, dx
mov es, dx
incb (_k_reenter)
mov esi, esp ! assumes P_STACKBASE == 0
mov esp, k_stktop
xor ebp, ebp ! for stacktrace
! end of inline save
sti ! allow SWITCHER to be interrupted
! now set up parameters for sys_call()
push ebx ! pointer to user message
push eax ! src/dest
push ecx ! SEND/RECEIVE/BOTH

call _sys_call ! sys_call(function, src_dest, m_ptr)
! caller is now explicitly in proc_ptr

mov AXREG(esi), eax ! sys_call MUST PRESERVE si
cli ! disable interrupts

! Fall into code to restart proc/task running.

!*===========================================================================*
!* restart *
!*===========================================================================*

_restart:
...
sys_call() verifies that the message is legal before calling mini_send() or mini_rec() (or both). Note that proc_ptr is used to indicate which process made the system call.


sys_call(), mini_send(), and mini_rec() are all found in src/kernel/proc.c.

/*===========================================================================*
* sys_call *
*===========================================================================*/
PUBLIC int sys_call(function, src_dest, m_ptr)
int function; /* SEND, RECEIVE, or BOTH */
int src_dest; /* source to receive from or dest to send to */
message *m_ptr; /* pointer to message */
{
/* The only system calls that exist in MINIX are sending and receiving
* messages. These are done by trapping to the kernel with an INT instruction.
* The trap is caught and sys_call() is called to send or receive a message
* (or both). The caller is always given by proc_ptr.
*/

register struct proc *rp;
int n;

/* Check for bad system call parameters. */
if (!isoksrc_dest(src_dest)) return(E_BAD_SRC);
rp = proc_ptr;

if (isuserp(rp) && function != BOTH) return(E_NO_PERM);

/* The parameters are ok. Do the call. */
if (function & SEND) {
/* Function = SEND or BOTH. */
n = mini_send(rp, src_dest, m_ptr);

if (function == SEND || n != OK)
return(n); /* done, or SEND failed */
}

/* Function = RECEIVE or BOTH.
* We have checked user calls are BOTH, and trust 'function' otherwise.
*/
return(mini_rec(rp, src_dest, m_ptr));


}


Note the comment at the beginning of the function above. User processes are only allowed to call sendrec() and are not allowed to call send() and receive(). Why is this? Kees Bot answered this question in the comp.os.Minix newsgroup:

"(Requiring user processes to call sendrec() rather than send() and receive()) is mostly for security reasons. User processes can only use sendrec() because this requires that they wait for a reply (we don't want them to send() and then never bother to receive()). We don't let (user processes) talk to tasks so that tasks don't have to bother checking who they're dealing with. And we don't allow user processes to talk to other user processes, because they might stall on each other in a way that MM and FS can't undo."



The most interesting sections of mini_send() are in the colors green, blue, and purple.

If the destination process is also trying to send a message to the source, a deadlock occurs.

If a deadlock does not occur, there are two possibilities:

1) The destination process is waiting to receive a message from either the specific source or from any source. Copy the message to the destination and unblock the destination process (i.e., mark the process as ready to begin executing again).

2) The destination process is not waiting to receive a message from either the specific source or from any source. Block the source until the destination receives the message.

/*===========================================================================*
*                            mini_send                             *
*===========================================================================*/
PRIVATE int mini_send(caller_ptr, dest, m_ptr)
register struct proc *caller_ptr;       /* who is trying to send a message? */
int dest;                     /* to whom is message being sent? */
message *m_ptr;                     /* pointer to message buffer */
{
/* Send a message from 'caller_ptr' to 'dest'. If 'dest' is blocked waiting
* for this message, copy the message to it and unblock 'dest'. If 'dest' is
* not waiting at all, or is waiting for another source, queue 'caller_ptr'.
*/

register struct proc *dest_ptr, *next_ptr;
vir_bytes vb;                     /* message buffer pointer as vir_bytes */
vir_clicks vlo, vhi;              /* virtual clicks containing message to send */

/* User processes are only allowed to send to FS and MM. Check for this. */
if (isuserp(caller_ptr) && !issysentn(dest)) return(E_BAD_DEST);
dest_ptr = proc_addr(dest);       /* pointer to destination's proc entry */
if (isemptyp(dest_ptr)) return(E_BAD_DEST);       /* dead dest */

#if ALLOW_GAP_MESSAGES
/* This check allows a message to be anywhere in data or stack or gap.
* It will have to be made more elaborate later for machines which
* don't have the gap mapped.
*/
vb = (vir_bytes) m_ptr;
vlo = vb >> CLICK_SHIFT;       /* vir click for bottom of message */
vhi = (vb + MESS_SIZE - 1) >> CLICK_SHIFT;       /* vir click for top of msg */
if (vlo < caller_ptr->p_map[D].mem_vir || vlo > vhi ||
vhi >= caller_ptr->p_map[S].mem_vir + caller_ptr->p_map[S].mem_len)
return(EFAULT);
#else
/* Check for messages wrapping around top of memory or outside data seg. */
vb = (vir_bytes) m_ptr;
vlo = vb >> CLICK_SHIFT;       /* vir click for bottom of message */
vhi = (vb + MESS_SIZE - 1) >> CLICK_SHIFT;       /* vir click for top of msg */
if (vhi < vlo ||
vhi - caller_ptr->p_map[D].mem_vir >= caller_ptr->p_map[D].mem_len)
       return(EFAULT);
#endif

/* Check for deadlock by 'caller_ptr' and 'dest' sending to each other. */
if (dest_ptr->p_flags & SENDING) {
       next_ptr = proc_addr(dest_ptr->p_sendto);
       while (TRUE) {
              if (next_ptr == caller_ptr) return(ELOCKED);
              if (next_ptr->p_flags & SENDING)
                     next_ptr = proc_addr(next_ptr->p_sendto);
              else
                     break;
       }
}


/* Check to see if 'dest' is blocked waiting for this message. */
if ( (dest_ptr->p_flags & (RECEIVING | SENDING)) == RECEIVING &&
(dest_ptr->p_getfrom == ANY ||
dest_ptr->p_getfrom == proc_number(caller_ptr))) {
       /* Destination is indeed waiting for this message. */
       CopyMess(proc_number(caller_ptr), caller_ptr, m_ptr, dest_ptr,
               dest_ptr->p_messbuf);
       dest_ptr->p_flags &= ~RECEIVING;       /* deblock destination */
       if (dest_ptr->p_flags == 0) ready(dest_ptr);
}
else {
       /* Destination is not waiting. Block and queue caller. */
       caller_ptr->p_messbuf = m_ptr;
       if (caller_ptr->p_flags == 0) unready(caller_ptr);
       caller_ptr->p_flags |= SENDING;
       caller_ptr->p_sendto= dest;

       /* Process is now blocked. Put in on the destination's queue. */
       if ( (next_ptr = dest_ptr->p_callerq) == NIL_PROC)
              dest_ptr->p_callerq = caller_ptr;
       else {
              while (next_ptr->p_sendlink != NIL_PROC)
                     next_ptr = next_ptr->p_sendlink;
              next_ptr->p_sendlink = caller_ptr;
       }
       caller_ptr->p_sendlink = NIL_PROC;
}

return(OK);
}


The most interesting parts of mini_rec() are in the colors green and blue.

If the process src, mini_rec()'s second parameter, is trying to send a message to the destination, copy the message from the source to the destination and unblock the source.

If the destination caller_ptr, mini_rec()'s first parameter, is not trying to send a message to the destination, block the destination until the source sends a message.

/*===========================================================================*
*                            mini_rec                             *
*===========================================================================*/
PRIVATE int mini_rec(caller_ptr, src, m_ptr)
register struct proc *caller_ptr;       /* process trying to get message */
int src;                     /* which message source is wanted (or ANY) */
message *m_ptr;                     /* pointer to message buffer */
{
/* A process or task wants to get a message. If one is already queued,
* acquire it and deblock the sender. If no message from the desired source
* is available, block the caller. No need to check parameters for validity.
* Users calls are always sendrec(), and mini_send() has checked already.
* Calls from the tasks, MM, and FS are trusted.
*/

register struct proc *sender_ptr;
register struct proc *previous_ptr;

/* Check to see if a message from desired source is already available. */
if (!(caller_ptr->p_flags & SENDING)) {
       /* Check caller queue. */
for (sender_ptr = caller_ptr->p_callerq; sender_ptr != NIL_PROC;
        previous_ptr = sender_ptr, sender_ptr = sender_ptr->p_sendlink) {
       if (src == ANY || src == proc_number(sender_ptr)) {
              /* An acceptable message has been found. */
              CopyMess(proc_number(sender_ptr), sender_ptr,
                      sender_ptr->p_messbuf, caller_ptr, m_ptr);
              if (sender_ptr == caller_ptr->p_callerq)
                     caller_ptr->p_callerq = sender_ptr->p_sendlink;
              else
                     previous_ptr->p_sendlink = sender_ptr->p_sendlink;
              if ((sender_ptr->p_flags &= ~SENDING) == 0)
                     ready(sender_ptr);       /* deblock sender */
              return(OK);
       }
}


/* Check for blocked interrupt. */
if (caller_ptr->p_int_blocked && isrxhardware(src)) {
       m_ptr->m_source = HARDWARE;
       m_ptr->m_type = HARD_INT;
       caller_ptr->p_int_blocked = FALSE;
       return(OK);
}
}

/* No suitable message is available. Block the process trying to receive. */
caller_ptr->p_getfrom = src;
caller_ptr->p_messbuf = m_ptr;
if (caller_ptr->p_flags == 0) unready(caller_ptr);
caller_ptr->p_flags |= RECEIVING;


/* If MM has just blocked and there are kernel signals pending, now is the
* time to tell MM about them, since it will be able to accept the message.
*/
if (sig_procs > 0 && proc_number(caller_ptr) == MM_PROC_NR && src == ANY)
       inform();
return(OK);
}


One confusing aspect of the functions and assembly routines involves underscores (_). In the network service, send() and receive() are called in several locations; __receive() and __send() (both with two underscores) are never called. So how does this transformation occur? To begin with, the following #define's are found in include/minix/syslib.h (include/minix/syslib.h is #include'd in inet.h, which is #include'd in every file that calls these functions):

/* Hide names to avoid name space pollution. */
#define sendrec              _sendrec
#define receive              _receive
#define send              _send

This accounts for one of the underscores. What about the second underscore? In order for a C function to call an assembly routine, the assembly routine must begin with an underscore. So, for example, a C function calls the assembly routine _receive by calling receive(). This accounts for the second underscore.



THE FILE SYSTEM


So when the file system receives a message, what does it do with it?

In order to understand what the file system does after it receives a message, two important arrays in src/fs/table.c, call_vec[] and dmap[], must be studied. call_vec[] is the array that matches a request type to a function within the file system. For example, for read requests, do_read() is called and for open requests, do_open() is called:

PUBLIC _PROTOTYPE (int (*call_vec[]), (void) ) = {

...
       do_read,       /* 3 = read       */
do_write, /* 4 = write */
       do_open,       /* 5 = open       */
...
};

Note that do_open() is a generic function for all file types (e.g., regular files, character special files, block special files, etc).

dmap[], on the other hand, has functions that are specific to the device and the request.

PUBLIC struct dmap dmap[] = {

/* ? Open/Close I/O Task # Device File
- ---------- -------- ----------- ------ ---- */
DT(1, no_dev, 0, 0) /* 0 = not used */
DT(1, gen_opcl, gen_io, MEM) /* 1 = /dev/mem */
DT(1, gen_opcl, gen_io, FLOPPY) /* 2 = /dev/fd0 */
DT(NR_CTRLRS >= 1, gen_opcl, gen_io, CTRLR(0)) /* 3 = /dev/c0 */
DT(1, tty_opcl, gen_io, TTY) /* 4 = /dev/tty00 */
DT(1, ctty_opcl, ctty_io, TTY) /* 5 = /dev/tty */
DT(ENABLE_PRINTER, gen_opcl, gen_io, PRINTER) /* 6 = /dev/lp */

#if (MACHINE == IBM_PC)
DT(1, no_dev, 0, ANY) /* 7 = /dev/ip */
DT(NR_CTRLRS >= 2, gen_opcl, gen_io, CTRLR(1)) /* 8 = /dev/c1 */
DT(0, 0, 0, 0) /* 9 = not used */
DT(NR_CTRLRS >= 3, gen_opcl, gen_io, CTRLR(2)) /*10 = /dev/c2 */
DT(0, 0, 0, 0) /*11 = not used */
DT(NR_CTRLRS >= 4, gen_opcl, gen_io, CTRLR(3)) /*12 = /dev/c3 */
DT(ENABLE_SB16, gen_opcl, gen_io, SB16) /*13 = /dev/audio */
DT(ENABLE_SB16, gen_opcl, gen_io, SB16MIXER) /*14 = /dev/mixer */
#endif /* IBM_PC */
};


The element of interest in dmap[] is major device number 7:

PUBLIC struct dmap dmap[] = {

/* ? Open/Close I/O Task # Device File
- ---------- -------- ----------- ------ ---- */
...
DT(1, no_dev, 0, ANY) /* 7 = /dev/ip */
...
};

This entry is later changed by a
call to svrctl(). This call to svrctrl() changes the "Open/Close" field to clone_opcl() and the "I/O" field to gen_io(). Therefore, when the file system receives a read, write, or ioctl request for major device number 7, gen_io() is called and when the file system receives an open or close request for major device number 7, clone_opcl() is called. gen_io(), in turn, calls sendrec() to send a request message to the network service. (Note that clone_opcl() calls gen_io().)

Let's say a user process opens /dev/udp. The request to open the file is sent in the normal manner from the kernel to the file system. This message is received by get_work(), which was called in the endless loop within main() (both functions are found in src/fs/main.c). main() then calls the function within call_vec[] for the open request. The relevant entry in call_vec[] for an open request is the following:

PUBLIC _PROTOTYPE (int (*call_vec[]), (void) ) = {

...
       do_open,       /* 5 = open       */
...
};

Therefore, do_open() is called, which then calls common_open() (do_open() and common_open() are both found in src/fs/open.c). common_open() checks permissions and then calls dev_open() (found in src/fs/device.c) since /dev/udp (and all other files associated with the network service) is a character special file. dev_open() then calls the appropriate function within dmap[]. As described above, this function is clone_opcl().