Please wait until the page is fully downloaded and then press the "Expand" button or the blue line numbers.

0043001 /*
0043002 ip_write.c
0043003 
0043004 Copyright 1995 Philip Homburg
0043005 */
As the name of the file suggests, ip_write.c handles the ip portion of a write. If an ip device file (e.g., /dev/ip) is opened directly, ip_write() is called by sr_rwio(). If, on the other hand, a higher-layer device file (e.g., /dev/udp) is opened instead, ip_write() is called by a higher-layer function (e.g., udp's restart_write_fd().

The most important function in ip.c is ip_send(). ip_send() sets some of the fields of the outgoing ip packet's ip header and then either sends the packet to a lower-layer protocol (e.g., ethernet) or places the packet in the ip port's loopback queue (i.e., in the ip_loopb_head queue).


0043006 
0043007 #include "inet.h"
0043008 #include "buf.h"
0043009 #include "event.h"
0043010 #include "type.h"
0043011 
0043012 #include "arp.h"
0043013 #include "assert.h"
0043014 #include "clock.h"
0043015 #include "eth.h"
0043016 #include "icmp_lib.h"
0043017 #include "io.h"
0043018 #include "ip.h"
0043019 #include "ip_int.h"
0043020 #include "ipr.h"
0043021 
0043022 THIS_FILE
0043023 
0043024 FORWARD void error_reply ARGS(( ip_fd_t *fd, int error ));
0043025 
0043026 PUBLIC int ip_write (fd, count)
0043027 int fd;
0043028 size_t count;
ip_write()

ip_write() simply gets an ip packet from a higher layer and then calls ip_send().

For example, after assembling an ip packet and placing the packet in the write queue of the appropriate udp port, udp's restart_write_fd() calls ip_write(), which then calls udp_get_data() to get the packet from the queue.

For a write to a udp file descriptor, ip_write()'s position in the big picture is as follows:

It is important to note that ip_write() ALWAYS RETURNS NW_OK! In previous versions of the network service, ip_write() returned other values (including NW_SUSPEND).udp write path

For a write to a udp device (e.g., /dev/udp), the code takes the following path:


sr_rwio()
udp_write()
restart_write_fd()
ip_write()
ip_send()
if (packet is destined to a system on the local ethernet network) {
ipeth_send()
if (no previous packet being processed by ethernet task)
eth_send()
if (eth_send() can't immediately send packet)
eth_write()
}
else if (packet must be routed)
oroute_frag()
else if (packet ist destined for a local destination)
ev_enqueue()



0043029 {
0043030          ip_fd_t *ip_fd;
0043031          acc_t *pack;
0043032          int r;
0043033 
Find the ip file descriptor whose index within ip_fd_table[] is fd, the first parameter of ip_write().


0043034          ip_fd= &ip_fd_table[fd];
0043035          if (count > IP_MAX_PACKSIZE)
If the ip file descriptor was opened by the udp code and the packet's length is over 40000 bytes (i.e., over IP_MAX_PACKSIZE bytes), the system does not behave properly. error_reply() calls udp_get_data(), which in this scenario only clears out the up_wr_pack field without setting up_wr_count to zero. Therefore, when ip_write() returns, reply_thr_get() incorrectly reports to the process that requested the write that a nonzero number of bytes were written.

Since 40000 bytes is extremely large for a packet, this problem probably rarely surfaces.

IP_MAX_PACKSIZE is #define'd in include/net/gen/in.h:

#define IP_MAX_PACKSIZE 40000 /* 8192 */
/* Note: this restriction is not part of the IP-protocol but introduced by this implementation. */


0043036          {
0043037                   error_reply (ip_fd, EPACKSIZE);
error_reply() / ip_write

error_reply(ip_fd, error) simply calls the function referenced by the ip file descriptor ip_fd's (error_reply()'s first parameter) associated ip port's if_get_userdata field.

If a process opened up an ip device file (e.g., /dev/ip) directly and therefore opened up the ip file descriptor ip_fd, if_get_userdata was set to sr_get_userdata(). If this is the case, sr_get_userdata() simply sends the file system an error code or the number of bytes written.

If the udp code opened up the ip file descriptor ip_fd, if_get_userdata was set to udp_get_data(). If this is the case, udp_get_data() simply removes the packet from the up_wr_pack field and turns off the write flag for the udp file descriptor. Note that in this scenario, error (error_reply()'s second parameter) is not used.

If the icmp code opened up the ip file descriptor, if_get_userdata was set to icmp_getdata().

Note that error_reply() is somewhat of a misnomer. Even if an error did not occur, error_reply() is called.


0043038                   return NW_OK;
0043039          }
0043040          pack= (*ip_fd->if_get_userdata)(ip_fd->if_srfd, (size_t)0,
0043041                   count, FALSE);
if_get_userdata is the higher-layer "get data" function. For example, when the udp code opens an ip file descriptor (by calling ip_open()), the if_get_userdata field is set to udp_get_data() and the if_srfd is set to the ip file descriptor's associated udp port. udp_get_data() returns the packet that was obtained from the process requesting the write.

For the icmp layer, if_get_userdata is set to icmp_getdata(). Again, icmp_getdata() returns the packet that was obtained from the process requesting the write.


udp_get_data()


udp_get_data() is called only indirectly by the ip code. udp_get_data() has a number of uses, which makes it a somewhat difficult function to understand.

During the initialization of a udp port (when the state of the udp port is UPS_SETPROTO), udp_get_data() is called twice. The first time udp_get_data() is called, the ip code tries to initialize the udp port's underlying ip file descriptor. udp_get_data() is called the second time (through reply_thr_get()) to change the state of the udp port to UPS_GETCONF.

After initialization (when the state of the udp port is UPS_MAIN), udp_get_data() is called to send either packets or configuration data to the underlying layer (i.e., the ip layer). For a write to a udp file descriptor, a packet is placed in the udp file descriptor's associated udp port's write field and udp_get_data() is then called by ip_write() to move the data to the ip layer. If ip_ioctl() is called to configure a udp port, ip_ioctl() calls udp_get_data() to get configuration data for the port (including, for example, the ip address of the port).

After a write operation is called, udp_get_data() is called a second time to clear some of the udp port's fields in preparation for the next write.


0043042          if (!pack)
0043043                   return NW_OK;
It is important to note that when the udp code calls ip_write(), the (indirect) call to udp_get_data() on lines 43040 - 43041 never returns NULL. Here is the reason why:

In order to get the packet from the udp code, udp_get_data() calls bf_cut(). bf_cut() only returns NULL if all 3 of its arguments are 0 (zero), which cannot be the case if the udp code called ip_write(). To be more specific, if a process requested a write of 0 bytes to a udp file descriptor, the (indirect) call to sr_get_userdata() in restart_write_fd() would have returned NULL and restart_write_fd() would have returned on line 62207, before calling ip_write().


0043044          r= ip_send(fd, pack, count);
ip_send()

ip_send() sets a few of the outgoing ip packet's ip header fields, checks the ip options of the packet (if there are any), and then either places the packet in the loopback queue of the ip port (if the packet is destined for the loopback address or the ip address of the ip port itself) or sends the packet off to a lower layer by calling its protocol-specific function (e.g., ipeth_send() for the ethernet layer).


0043045          assert(r != NW_WOULDBLOCK);
0043046 
0043047          if (r == NW_OK)
0043048                   error_reply (ip_fd, count);
error_reply() / ip_write

error_reply(ip_fd, error) simply calls the function referenced by the ip file descriptor ip_fd's (error_reply()'s first parameter) associated ip port's if_get_userdata field.

If a process opened up an ip device file (e.g., /dev/ip) directly and therefore opened up the ip file descriptor ip_fd, if_get_userdata was set to sr_get_userdata(). If this is the case, sr_get_userdata() simply sends the file system an error code or the number of bytes written.

If the udp code opened up the ip file descriptor ip_fd, if_get_userdata was set to udp_get_data(). If this is the case, udp_get_data() simply removes the packet from the up_wr_pack field and turns off the write flag for the udp file descriptor. Note that in this scenario, error (error_reply()'s second parameter) is not used.

If the icmp code opened up the ip file descriptor, if_get_userdata was set to icmp_getdata().

Note that error_reply() is somewhat of a misnomer. Even if an error did not occur, error_reply() is called.


0043049          else
0043050                   error_reply (ip_fd, r);
error_reply() / ip_write

error_reply(ip_fd, error) simply calls the function referenced by the ip file descriptor ip_fd's (error_reply()'s first parameter) associated ip port's if_get_userdata field.

If a process opened up an ip device file (e.g., /dev/ip) directly and therefore opened up the ip file descriptor ip_fd, if_get_userdata was set to sr_get_userdata(). If this is the case, sr_get_userdata() simply sends the file system an error code or the number of bytes written.

If the udp code opened up the ip file descriptor ip_fd, if_get_userdata was set to udp_get_data(). If this is the case, udp_get_data() simply removes the packet from the up_wr_pack field and turns off the write flag for the udp file descriptor. Note that in this scenario, error (error_reply()'s second parameter) is not used.

If the icmp code opened up the ip file descriptor, if_get_userdata was set to icmp_getdata().

Note that error_reply() is somewhat of a misnomer. Even if an error did not occur, error_reply() is called.


0043051          return NW_OK;
0043052 }
0043053 
0043054 PUBLIC int ip_send(fd, data, data_len)
0043055 int fd;
0043056 acc_t *data;
0043057 size_t data_len;
ip_send()

ip_send() sets a few of the outgoing ip packet's ip header fields, checks the ip options of the packet (if there are any), and then either places the packet in the loopback queue of the ip port (if the packet is destined for the loopback address or the ip address of the ip port itself) or sends the packet off to a lower layer by calling its protocol-specific function (e.g., ipeth_send() for the ethernet layer).udp write path

For a write to a udp device (e.g., /dev/udp), the code takes the following path:


sr_rwio()
udp_write()
restart_write_fd()
ip_write()
ip_send()
if (packet is destined to a system on the local ethernet network) {
ipeth_send()
if (no previous packet being processed by ethernet task)
eth_send()
if (eth_send() can't immediately send packet)
eth_write()
}
else if (packet must be routed)
oroute_frag()
else if (packet ist destined for a local destination)
ev_enqueue()



0043058 {
0043059          ip_port_t *ip_port;
0043060          ip_fd_t *ip_fd;
0043061          ip_hdr_t *ip_hdr, *tmp_hdr;
0043062          ipaddr_t dstaddr, netmask, nexthop, hostrep_dst;
0043063          u8_t *addrInBytes;
0043064          acc_t *tmp_pack, *tmp_pack1;
0043065          int hdr_len, hdr_opt_len, r;
0043066          int broadcast, ttl;
0043067          ev_arg_t arg;
0043068 
Locate the ip file descriptor whose index within ip_fd_table[] is fd, the first parameter of ip_send(), and then find the ip file descriptor's associated ip port.


0043069          ip_fd= &ip_fd_table[fd];
0043070          ip_port= ip_fd->if_port;
0043071 
0043072          if (!(ip_fd->if_flags & IFF_OPTSET))
An ip file descriptor cannot be used until it has been configured. Verify that the ip file descriptor has been configured.


0043073          {
0043074                   bf_afree(data);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0043075                   return EBADMODE;
0043076          }
0043077 
0043078          data_len= bf_bufsize(data);
bf_bufsize()

bf_bufsize() returns the total buffer size of a linked list of accessors (i.e., the sum of acc_length for the accessors in a linked list).

For a detailed description of the network service's buffer management, click here.


0043079 
0043080          assert(ip_fd->if_port->ip_flags & IPF_IPADDRSET);
0043081 
0043082          if (ip_fd->if_ipopt.nwio_flags & NWIO_RWDATONLY)
From ip(4):

"NWIO_RWDATONLY specifies that the header should be omitted from a write request. This option can only be used when all header fields are specified in previous options: NWIO_EN_LOC, NWIO_DI_BROAD, NWIO_REMSPEC, NWIO_PROTOSPEC and NWIO_HDR_O_SPEC. A read operation will also only return the data part, so the IP options will be lost."

By default, neither the udp code nor the icmp code sets the NWIO_RWDATONLY flag.


0043083          {
0043084                   tmp_pack= bf_memreq(IP_MIN_HDR_SIZE);
IP_MIN_HDR_SIZE is is the minimum size of an ip header (i.e., an ip header without ip options) and is #define'd in include/net/gen/in.h:

#define IP_MIN_HDR_SIZE 20


bf_memreq()


After the buffers have been initialized, accessors[] looks like the following:



bf_memreq() allocates accessors to the caller. For example, if 1514 bytes of buffer space are requested immediately after the network process starts and each buffer is 512 bytes (the default), then accessors[] will look like the following:



Note that three elements of accessors[] have been removed from buf512_freelist and that the head of the chain of the 3 accessors is returned by bf_memreq(). Also note that the acc_linkC and buf_linkC fields have been set to one and acc_length and acc_offset have been set to their appropriate values.

So what happens if there are not enough buffers on the buf512_freelist to satisfy a request? On lines 2280-2290 of buf.c, functions that free buffers for the specific clients (e.g., eth_buffree()) are called until there are enough buffers on buf512_freelist.

For a complete description of the network service's buffer management, click here.


0043085                   tmp_pack->acc_next= data;
0043086                   data= tmp_pack;
0043087                   data_len += IP_MIN_HDR_SIZE;
0043088          }
0043089          if (data_len<IP_MIN_HDR_SIZE)
The length of the data plus the ip header must (obviously) be at least the size of an ip header.


0043090          {
0043091                   bf_afree(data);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0043092                   return EPACKSIZE;
0043093          }
0043094 
0043095          data= bf_packIffLess(data, IP_MIN_HDR_SIZE);
bf_packIffLess()

If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0043096          ip_hdr= (ip_hdr_t *)ptr2acc_data(data);
If called from the udp code or the icmp code, the udp code and the icmp code will have placed an ip header on the packet before sending the packet off to ip_write().


ptr2acc_data()


The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


ip_hdr_t


struct ip_hdr_t is the structure of an ip header. "ih" (e.g., ih_src, ih_dst) stands for "Ip Header".

ip_hdr_t is declared in /include/net/gen/ip_hdr.h:

typedef struct ip_hdr

{
u8_t ih_vers_ihl, ih_tos;
u16_t ih_length, ih_id, ih_flags_fragoff;
u8_t ih_ttl, ih_proto;
u16_t ih_hdr_chk;
ipaddr_t ih_src, ih_dst;
} ip_hdr_t;

ih_vers_ihl: The lower 4 bits is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits is the version number (e.g., IPv4).


ih_tos: tos stands for "Type Of Service" and is the priority of the ip packet. A value of zero is the lowest priority. Both UDP and TCP have a default TOS of zero.

#define TCP_DEF_TOS 0
#define UDP_TOS 0


ih_length: The length of the entire ip packet, including the ip header.


ih_id: The value of ih_id for the first packet sent out is determined by ip_init() and is equal to the number of clock ticks since reboot (i.e., the value returned by get_time) and is incremented for each packet sent out. This value is used to combine fragments at the receiving end if fragmentation has occurred.


ih_flags_fragoff: ih_flags_fragoff is a combination of flags and a (possible) fragmentation offset ("fragoff").

If the packet should not be fragmented, ih_flags_fragoff is set to IH_DONT_FRAG. If there are additional fragments (e.g., the 3rd fragment of 4 fragments), ih_flags_fragoff is set to IH_MORE_FRAGS.

If the packet is indeed just a fragment of a packet, this value indicates the starting byte position (in 8 byte increments) of the original ip packet's data. So for example, if an ip packet of data size (not including the ip header) is broken up into two fragments of 1496 and 504 bytes each, the first fragment would have a fragmentation offset of 0 bytes and the second fragment would have a fragmentation offset of 1496 bytes and ih_flags_fragoff is therefore 187 (1496 / 8 = 187).


ih_ttl: "Time to live" for the packet. As a packet is routed to the destination, each router decrements the packet's ttl. When the ttl reaches 0, the router sends an "icmp unreachable" packet to the source. The ttl is designed to prevent packets that can't reach their destination from indefinitely bouncing around between routers. UDP's default TTL is 30:

#define UDP_TTL 30

Note that the Minix code also uses this value as a timeout value (in seconds). This code was written before the ttl field was redefined to be strictly a hope count. The original IP RFC defines the ttl field as the time to live in seconds.


ih_proto: The protocol of the ip packet. For example, if the packet is a udp packet, ih_proto will be 17. If the packet is a tcp packet, ih_proto will be 6.


ih_hdr_chk: Checksum for the header.


ih_src, ih_dst: Source and destination ip address of the ip packet.


IP HEADER (as given by RFC 791)


0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



0043097          if (data->acc_linkC != 1 || data->acc_buffer->buf_linkC != 1)
If the accessor or its associated buffer is part of another linked list of accessors, make a separate copy of the accessor linked list to avoid impacting any other accessor linked list.


0043098          {
0043099                   tmp_pack= bf_memreq(IP_MIN_HDR_SIZE);
bf_memreq()

After the buffers have been initialized, accessors[] looks like the following:



bf_memreq() allocates accessors to the caller. For example, if 1514 bytes of buffer space are requested immediately after the network process starts and each buffer is 512 bytes (the default), then accessors[] will look like the following:



Note that three elements of accessors[] have been removed from buf512_freelist and that the head of the chain of the 3 accessors is returned by bf_memreq(). Also note that the acc_linkC and buf_linkC fields have been set to one and acc_length and acc_offset have been set to their appropriate values.

So what happens if there are not enough buffers on the buf512_freelist to satisfy a request? On lines 2280-2290 of buf.c, functions that free buffers for the specific clients (e.g., eth_buffree()) are called until there are enough buffers on buf512_freelist.

For a complete description of the network service's buffer management, click here.


0043100                   tmp_hdr= (ip_hdr_t *)ptr2acc_data(tmp_pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043101                   *tmp_hdr= *ip_hdr;
0043102                   tmp_pack->acc_next= bf_cut(data, IP_MIN_HDR_SIZE,
0043103                            data_len-IP_MIN_HDR_SIZE);
bf_cut()

If a section of a linked list needs to be duplicated, bf_cut(data, offset, length) is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



Note that the original linked list remains unchanged and that acc_linkC for all the accessors in the new linked list is one.

If length (the second parameter) is zero, simply duplicate the first accessor in the linked list but set acc_length=0 and acc_next=null. In other words, create a linked list of length one accessor whose acc_length is 0.

bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size.

For a full description of the network service's buffer management, click here.



0043104                   bf_afree(data);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0043105                   ip_hdr= tmp_hdr;
0043106                   data= tmp_pack;
0043107                   assert (data->acc_length >= IP_MIN_HDR_SIZE);
0043108          }
0043109 
0043110          if (ip_fd->if_ipopt.nwio_flags & NWIO_HDR_O_SPEC)
From ip(4):

"NWIO_HDR_O_SPEC specifies all IP header options in advance. The values are taken from nwio_hdropt, nwio_tos, nwio_ttl, and nwio_df."

nwio_hdropt, nwio_tos, nwio_ttl, and nwio_df are fields in an ip file descriptor.

As with the NWIO_RWDATONLY flag, the udp code and the icmp code do not set the NWIO_HDR_O_SPEC flag.


0043111          {
0043112                   hdr_opt_len= ip_fd->if_ipopt.nwio_hdropt.iho_opt_siz;
0043113                   if (hdr_opt_len)
If there are ip options for an ip file descriptor whose NWIO_HDR_O_SPEC flag is set, the ip options for any outgoing packet will be taken from the iho_data[] field of the ip file descriptor. The length will be equal to the ip file descriptor's iho_opt_siz field.

If this is the case, insert the ip options between the ip header and the data.


0043114                   {
0043115                            tmp_pack= bf_cut(data, 0, IP_MIN_HDR_SIZE);
bf_cut()

If a section of a linked list needs to be duplicated, bf_cut(data, offset, length) is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



Note that the original linked list remains unchanged and that acc_linkC for all the accessors in the new linked list is one.

If length (the second parameter) is zero, simply duplicate the first accessor in the linked list but set acc_length=0 and acc_next=null. In other words, create a linked list of length one accessor whose acc_length is 0.

bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size.

For a full description of the network service's buffer management, click here.



0043116                            tmp_pack1= bf_cut (data, IP_MIN_HDR_SIZE,
0043117                                     data_len-IP_MIN_HDR_SIZE);
bf_cut()

If a section of a linked list needs to be duplicated, bf_cut(data, offset, length) is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



Note that the original linked list remains unchanged and that acc_linkC for all the accessors in the new linked list is one.

If length (the second parameter) is zero, simply duplicate the first accessor in the linked list but set acc_length=0 and acc_next=null. In other words, create a linked list of length one accessor whose acc_length is 0.

bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size.

For a full description of the network service's buffer management, click here.



0043118                            bf_afree(data);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0043119                            data= bf_packIffLess(tmp_pack, IP_MIN_HDR_SIZE);
bf_packIffLess()

If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0043120                            ip_hdr= (ip_hdr_t *)ptr2acc_data(data);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043121                            tmp_pack= bf_memreq (hdr_opt_len);
bf_memreq()

After the buffers have been initialized, accessors[] looks like the following:



bf_memreq() allocates accessors to the caller. For example, if 1514 bytes of buffer space are requested immediately after the network process starts and each buffer is 512 bytes (the default), then accessors[] will look like the following:



Note that three elements of accessors[] have been removed from buf512_freelist and that the head of the chain of the 3 accessors is returned by bf_memreq(). Also note that the acc_linkC and buf_linkC fields have been set to one and acc_length and acc_offset have been set to their appropriate values.

So what happens if there are not enough buffers on the buf512_freelist to satisfy a request? On lines 2280-2290 of buf.c, functions that free buffers for the specific clients (e.g., eth_buffree()) are called until there are enough buffers on buf512_freelist.

For a complete description of the network service's buffer management, click here.


0043122                            memcpy (ptr2acc_data(tmp_pack), ip_fd->if_ipopt.
0043123                                     nwio_hdropt.iho_data, hdr_opt_len);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043124                            data->acc_next= tmp_pack;
0043125                            tmp_pack->acc_next= tmp_pack1;
0043126                            hdr_len= IP_MIN_HDR_SIZE+hdr_opt_len;
0043127                   }
0043128                   else
The ip file descriptor is not configured with any ip options.


0043129                            hdr_len= IP_MIN_HDR_SIZE;
Since the ip file descriptor's NWIO_HDR_O_SPEC flag is set, set the ip header's type of service (tos), the fragmentation offset flags, and the time to live (ttl) fields according to the ip file descriptor's configuration.


0043130                   ip_hdr->ih_vers_ihl= hdr_len/4;
0043131                   ip_hdr->ih_tos= ip_fd->if_ipopt.nwio_tos;
0043132                   ip_hdr->ih_flags_fragoff= 0;
0043133                   if (ip_fd->if_ipopt.nwio_df)
0043134                            ip_hdr->ih_flags_fragoff |= HTONS(IH_DONT_FRAG);
0043135                   ip_hdr->ih_ttl= ip_fd->if_ipopt.nwio_ttl;
0043136                   ttl= ORTD_UNREACHABLE+1;              /* Don't check TTL */
0043137          }
0043138          else
The ip file descriptor's NWIO_HDR_O_SPEC flag is not set. Verify that the ip header passed in with the data is acceptable.


0043139          {
Verify that the ip header's purported length is consistent and that the header's time to live (ttl) is nonzero.


0043140                   hdr_len= (ip_hdr->ih_vers_ihl & IH_IHL_MASK)*4;
0043141                   r= NW_OK;
0043142                   if (hdr_len<IP_MIN_HDR_SIZE)
0043143                            r= EINVAL;
0043144                   else if (hdr_len>data_len)
0043145                            r= EPACKSIZE;
0043146                   else if (!ip_hdr->ih_ttl)
0043147                            r= EINVAL;
0043148                   if (r != NW_OK)
0043149                   {
0043150                            bf_afree(data);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0043151                            return r;
0043152                   }
0043153 
If there are ip options, verify that they are acceptable.


0043154                   data= bf_packIffLess(data, hdr_len);
bf_packIffLess()

If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0043155                   ip_hdr= (ip_hdr_t *)ptr2acc_data(data);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043156                   if (hdr_len != IP_MIN_HDR_SIZE)
0043157                   {
0043158                            r= ip_chk_hdropt((u8_t *)(ptr2acc_data(data) +
0043159                                     IP_MIN_HDR_SIZE),
0043160                                     hdr_len-IP_MIN_HDR_SIZE);
ip_chk_hdropt()

ip_chk_hdropt() goes through the ip header options (if there are any) and verifies that the options are acceptable. For example, ip_chk_hdropt() verifies that the same ip header option is not listed twice.


0043161                            if (r != NW_OK)
0043162                            {
0043163                                     bf_afree(data);
0043164                                     return r;
0043165                            }
0043166                   }
0043167                   ttl= ip_hdr->ih_ttl;
0043168          }
0043169          
The following lines apply to all cases. The IP header fields filled are in bold below.

Note that the protocol and destination address could have been previously set (by restart_write_fd(), for example).


IP HEADER (as given by RFC 791)

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+




ip_hdr_t


struct ip_hdr_t is the structure of an ip header. "ih" (e.g., ih_src, ih_dst) stands for "Ip Header".

ip_hdr_t is declared in /include/net/gen/ip_hdr.h:

typedef struct ip_hdr

{
u8_t ih_vers_ihl, ih_tos;
u16_t ih_length, ih_id, ih_flags_fragoff;
u8_t ih_ttl, ih_proto;
u16_t ih_hdr_chk;
ipaddr_t ih_src, ih_dst;
} ip_hdr_t;

ih_vers_ihl: The lower 4 bits is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits is the version number (e.g., IPv4).


ih_tos: tos stands for "Type Of Service" and is the priority of the ip packet. A value of zero is the lowest priority. Both UDP and TCP have a default TOS of zero.

#define TCP_DEF_TOS 0
#define UDP_TOS 0


ih_length: The length of the entire ip packet, including the ip header.


ih_id: The value of ih_id for the first packet sent out is determined by ip_init() and is equal to the number of clock ticks since reboot (i.e., the value returned by get_time) and is incremented for each packet sent out. This value is used to combine fragments at the receiving end if fragmentation has occurred.


ih_flags_fragoff: ih_flags_fragoff is a combination of flags and a (possible) fragmentation offset ("fragoff").

If the packet should not be fragmented, ih_flags_fragoff is set to IH_DONT_FRAG. If there are additional fragments (e.g., the 3rd fragment of 4 fragments), ih_flags_fragoff is set to IH_MORE_FRAGS.

If the packet is indeed just a fragment of a packet, this value indicates the starting byte position (in 8 byte increments) of the original ip packet's data. So for example, if an ip packet of data size (not including the ip header) is broken up into two fragments of 1496 and 504 bytes each, the first fragment would have a fragmentation offset of 0 bytes and the second fragment would have a fragmentation offset of 1496 bytes and ih_flags_fragoff is therefore 187 (1496 / 8 = 187).


ih_ttl: "Time to live" for the packet. As a packet is routed to the destination, each router decrements the packet's ttl. When the ttl reaches 0, the router sends an "icmp unreachable" packet to the source. The ttl is designed to prevent packets that can't reach their destination from indefinitely bouncing around between routers. UDP's default TTL is 30:

#define UDP_TTL 30

Note that the Minix code also uses this value as a timeout value (in seconds). This code was written before the ttl field was redefined to be strictly a hope count. The original IP RFC defines the ttl field as the time to live in seconds.


ih_proto: The protocol of the ip packet. For example, if the packet is a udp packet, ih_proto will be 17. If the packet is a tcp packet, ih_proto will be 6.


ih_hdr_chk: Checksum for the header.


ih_src, ih_dst: Source and destination ip address of the ip packet.


IP HEADER (as given by RFC 791)


0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



0043170          ip_hdr->ih_vers_ihl= (ip_hdr->ih_vers_ihl & IH_IHL_MASK) |
0043171                   (IP_VERSION << 4);
0043172          ip_hdr->ih_length= htons(data_len);
htons() / ntohs() / htonl() / ntohl()

From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0043173          ip_hdr->ih_flags_fragoff &= ~HTONS(IH_FRAGOFF_MASK |
0043174                   IH_FLAGS_UNUSED | IH_MORE_FRAGS);
0043175          if (ip_fd->if_ipopt.nwio_flags & NWIO_PROTOSPEC)
0043176                   ip_hdr->ih_proto= ip_fd->if_ipopt.nwio_proto;
From ip(4):

"NWIO_PROTOSPEC restricts communication to one IP protocol, specified in nwio_proto."

The udp code (IPPROTO_UDP) and the icmp code (IPPROTO_ICMP) both set the NWIO_PROTOSPEC flag.


0043177          ip_hdr->ih_id= htons(ip_port->ip_frame_id++);
0043178          ip_hdr->ih_src= ip_fd->if_port->ip_ipaddr;
As the source ip address, use the address of the ip file descriptor's associated ip port.


0043179          if (ip_fd->if_ipopt.nwio_flags & NWIO_REMSPEC)
0043180                   ip_hdr->ih_dst= ip_fd->if_ipopt.nwio_rem;
From ip(4):

"NWIO_REMSPEC can be used to restrict communication to one remote host. This host is taken from the nwio_rem field."

Neither the udp code nor the icmp code set the NWIO_REMSPEC field.


0043181 
0043182          dstaddr= ip_hdr->ih_dst;
0043183          hostrep_dst= ntohl(dstaddr);
0043184          r= 0;
Both local broadcasts and multicasts are acceptable.


0043185          if (hostrep_dst == (ipaddr_t)-1)
0043186                   ;       /* OK, local broadcast */
The cast of -1 to unsigned long (u32t) results in 0xffffff (hex). This corresponds to address 255.255.255.255, the 'limited broadcast' address.

ipaddr_t is declared in /include/net/gen/in.h:

typedef u32_t ipaddr_t;


0043187          else if ((hostrep_dst & 0xe0000000l) == 0xe0000000l)
0043188                   ;       /* OK, Multicast */
Note that 0xe0 is equal to 224 (decimal).

The Internet Assigned Numbers Authority (IANA), the organization that controls the assignment of IP multicast addresses, has assigned the Class D address space to be used for IP multicast. All IP multicast group addresses will fall in the range of 224.0.0.0 to 239.255.255.255.

Addresses in the 224.0.0.0 through 224.0.0.255 are used by network protocols on a local network segment. Packets with these addresses should never be forwarded by a router; they remain local on a particular LAN segment. They are always transmitted with a time-to-live (TTL) of 1.

The range of addresses from 239.0.0.0 through 239.255.255.255 contains limited scope addresses or administratively scoped addresses.

EBADDEST is #define'd in include/errno.h:

#define EBADDEST (_SIGN 55) /* not a valid destination address */


0043189          else if ((hostrep_dst & 0xf0000000l) == 0xf0000000l)
0043190                   r= EBADDEST;       /* Bad class */
Note that 0xf0 is 240 (decimal).

If the first part of the address is 240 or more (e.g., 241.0.0.45), the address is not valid.


0043191          else if ((dstaddr ^ ip_port->ip_ipaddr) & ip_port->ip_subnetmask)
0043192                   ;       /* OK, remote destination */
Determine if the ip address is remote (and can be delivered).

The first conditional (dstaddr ^ ip_port->ip_ipaddr) is false if the destination address and the ip address of the ip port are the same. The second conditional (ip_port->ipsubnetmask) is true if a subnet mask has been set for the ip port.

Note that the symbol ^ is the Exclusive OR (XOR) symbol.

192.168.1.1 ^ 192.168.1.1 = 0.0.0.0 = 0


Below is a refresher from http://www.mcsemag.info/basics/infra15.htm. Note that there are parts that are specific to Microsoft's implementation.

"If the destination address is not the same as the current port's IP address and there is a subnet mask for the port there is destination address that can be found.

When a TCP/IP host starts up, it ANDs its own IP address with its own subnet mask and keeps the result (= NetID) in RAM (different implementations use different solutions to achieve this).

Whenever the local host tries to contact another host it ANDs the other hosts IP address with its own Subnet Mask and then it checks to see if the result it received is equal to its own.

If it is equal, then the communication is local. The next step is ARP'ing the remote host (a function that involves hardware delivery). If it isn't equa, the communication is remote( the next step is checking routing opportunities)."


0043193          else if (!(dstaddr & ~ip_port->ip_subnetmask))
0043194                   r= EBADDEST;       /* Zero host part */
One cannot send to a network address. Determine if the ip address is a network address.

For example, the ip address 192.168.5.0 with a subnet mask of 255.255.255.0 (192.168.5.0/24) is a network address.

192.168.5.0 & ~(255.255.255.0) = 192.168.5.0 & (0.0.0.255) = 0.0.0.0 = 0


0043195          if (r<0)
0043196          {
0043197                   DIFBLOCK(1, r == EBADDEST,
0043198                            printf("bad destination: ");
0043199                            writeIpAddr(ip_hdr->ih_dst);
0043200                            printf("\n"));
0043201                   bf_afree(data);
0043202                   return r;
0043203          }
0043204          ip_hdr_chksum(ip_hdr, hdr_len);
ip_hdr_chksum()

ip_hdr_chksum() sets the ih_hdr_chk field of an ip header with the checksum of the ip header (minus the checksum field). This checksum is obtained from oneC_sum().


0043205 
0043206          data= bf_packIffLess(data, IP_MIN_HDR_SIZE);
bf_packIffLess()

If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0043207          assert (data->acc_length >= IP_MIN_HDR_SIZE);
0043208          ip_hdr= (ip_hdr_t *)ptr2acc_data(data);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043209 
0043210          addrInBytes= (u8_t *)&dstaddr;
addrInBytes points to the first byte of the destination address and is used to determine if the first octet of the destination address is 127 (see the next line of code). Addresses that begin with 127 (e.g., 127.0.0.1) are local addresses.


0043211 
0043212          if ((addrInBytes[0] & 0xff) == 0x7f)       /* local loopback */
The first byte of the destination address is 0x7f (127 decimal). Address 127.x.x.x is a local address.


0043213          {
0043214                   assert (data->acc_linkC == 1);
0043215                   dstaddr= ip_hdr->ih_dst;       /* swap src and dst
0043216                                                * addresses */
0043217                   ip_hdr->ih_dst= ip_hdr->ih_src;
0043218                   ip_hdr->ih_src= dstaddr;
If a packet is sent by a process to a 127.x.x.x address, the packet is immediately sent back to the same process. By placing the 127.x.x.x address in the source field of the ip header, the process knows that the packet was not from a remote host.


0043219                   data->acc_ext_link= NULL;
Packets in the ip_loopb_head/ip_loopb_tail queue (i.e., packets destined for the loopback address) are linked together by the acc_ext_link field. Since this will be the last packet in the queue, acc_ext_link for this packet will be null.


0043220                   if (ip_port->ip_loopb_head == NULL)
Place the packet at the tail (which may also be the head) of the ip port's loopback queue and place the corresponding event at the end of the system-wide event queue. When the system-wide event queue is later processed in the main loop, ip_process_loopb() will then pass the packets back to the process that sent them.


0043221                   {
0043222                            ip_port->ip_loopb_head= data;
0043223                            arg.ev_ptr= ip_port;
0043224                            ev_enqueue(&ip_port->ip_loopb_event,
0043225                                     ip_process_loopb, arg);
event_t / ev_enqueue() / ev_process() / ev_init() / ev_in_queue()

The event_t typedef is declared in inet/generic/event.h:

typedef struct event

{
ev_func_t ev_func;
ev_arg_t ev_arg;
struct event *ev_next;
} event_t;
If an event needs to be scheduled, ev_enqueue() is called to place the event in the system-wide event queue whose head is ev_head. ev_process() is eventually called from the main loop in inet.c to process the events. ev_in_queue(ev) simply returns TRUE if the event ev, ev_in_queue()'s only parameter, has a non-null value for func (see below) and FALSE if func is null. In this way, ev_in_queue() determines whether the event has been configured.

ev_init(ev) simply zeroes out the ev_func and ev_next fields of the event ev, ev_init()'s only parameter.

ev_func: A function (e.g., ip_process_loopb()) that performs some task.

ev_arg:

typedef union ev_arg

{
int ev_int;
void *ev_ptr;
} ev_arg_t;
ev_arg is ev_func's argument. In the case of a packet destined for the loopback address (127.0.0.1), the argument will be the ip port associated with the ip file descriptor that is sending out the packet. In the case of a message from the ethernet task that caused a deadlock, ev_arg is a pointer to the message's destination ethernet port.

ev_next: The next event in the system-wide event queue.


0043226                   }
0043227                   else
0043228                            ip_port->ip_loopb_tail->acc_ext_link= data;
0043229                   ip_port->ip_loopb_tail= data;
0043230 
0043231                   return NW_OK;
0043232          }
0043233 
0043234          if (dstaddr == (ipaddr_t)-1)
The cast of -1 to unsigned long (u32t) results in 0xffffff (hex). This corresponds to address 255.255.255.255, the 'limited broadcast' address.

ipaddr_t is declared in /include/net/gen/in.h:

typedef u32_t ipaddr_t;


0043235          {
0043236                   r= (*ip_port->ip_dev_send)(ip_port, dstaddr, data,
0043237                            /* broadcast */ 1);
The ip_dev_send field is set to either a reference to ipeth_send() or to ipps_send().


ipeth_send()


ipeth_send() is called (indirectly) by ip_send() to send out a packet to a destination address on the same subnet as the ip port from which it is sent or to send out a packet to a broadcast address. ipeth_send() first creates an ethernet header to prepend to the ip packet and then, if there are no packets waiting to be sent out, calls eth_send() in an attempt to send the packet to the ethernet task immediately. If eth_send() is not able to send the ethernet packet immediately, eth_write() is called to queue the packet. If there are already ethernet packets waiting to be sent out, eth_send() and eth_write() are not called and the packet is queued (i.e., placed in the de_q_head queue of the ip port).


0043238                   return r;
0043239          }
0043240          netmask= ip_get_netmask(dstaddr);
ip_get_netmask()

ip_get_netmask(hostaddr) simply returns the natural subnet mask of hostaddr, the only parameter to ip_get_netmask().

For a class A network, the subnet mask is 255.000.000.000 (0xff000000);
for a class B network, the subnet mask is 255.255.000.000 (0xffff0000);
for a class C network, the subnet mask is 255.255.255.000 (0xffffff00);
for a zero network type (0.xx.xx.xx), the subnet mask is (0x00000000).


0043241 
0043242          if (dstaddr == ip_port->ip_ipaddr)
The case of the destination ip address being the same as the ip address of the ip port is very similar to the local loopback case. Therefore, this block is very similar to the block that begins on line 43212.


0043243          {
0043244                   assert (data->acc_linkC == 1);
0043245 
0043246                   data->acc_ext_link= NULL;
0043247                   if (ip_port->ip_loopb_head == NULL)
0043248                   {
0043249                            ip_port->ip_loopb_head= data;
0043250                            arg.ev_ptr= ip_port;
0043251                            ev_enqueue(&ip_port->ip_loopb_event,
0043252                                     ip_process_loopb, arg);
event_t / ev_enqueue() / ev_process() / ev_init() / ev_in_queue()

The event_t typedef is declared in inet/generic/event.h:

typedef struct event

{
ev_func_t ev_func;
ev_arg_t ev_arg;
struct event *ev_next;
} event_t;
If an event needs to be scheduled, ev_enqueue() is called to place the event in the system-wide event queue whose head is ev_head. ev_process() is eventually called from the main loop in inet.c to process the events. ev_in_queue(ev) simply returns TRUE if the event ev, ev_in_queue()'s only parameter, has a non-null value for func (see below) and FALSE if func is null. In this way, ev_in_queue() determines whether the event has been configured.

ev_init(ev) simply zeroes out the ev_func and ev_next fields of the event ev, ev_init()'s only parameter.

ev_func: A function (e.g., ip_process_loopb()) that performs some task.

ev_arg:

typedef union ev_arg

{
int ev_int;
void *ev_ptr;
} ev_arg_t;
ev_arg is ev_func's argument. In the case of a packet destined for the loopback address (127.0.0.1), the argument will be the ip port associated with the ip file descriptor that is sending out the packet. In the case of a message from the ethernet task that caused a deadlock, ev_arg is a pointer to the message's destination ethernet port.

ev_next: The next event in the system-wide event queue.


0043253                   }
0043254                   else
0043255                            ip_port->ip_loopb_tail->acc_ext_link= data;
0043256                   ip_port->ip_loopb_tail= data;
0043257 
0043258                   return NW_OK;
0043259          }
0043260 
0043261          if (((dstaddr ^ ip_port->ip_ipaddr) & ip_port->ip_subnetmask) == 0)
Determine if the destination ip address is in the same network as the ip address of the ip port. If the two ip addresses are in the same network, the conditional will be TRUE. For example, 192.168.1.1/255.255.255.0 (ip address of port) is not in the same network as 192.168.2.1/255.255.255.0 (destination ip address):

(dstaddr ^ ip_port->ip_ipaddr) & ip_port->ip_subnetmask) == 0
(192.168.2.1 ^ 192.168.1.1) & 255.255.255.0) == 0
(0.0.2.0) & 255.255.255.0 == 0
0.0.2.0 == 0 (this is *not* a TRUE statement - therefore, the conditional is FALSE)


0043262          {
0043263                   broadcast= (dstaddr == (ip_port->ip_ipaddr |
0043264                            ~ip_port->ip_subnetmask));
Determine if the destination ip address is the broadcast address. If the destination is an ip address, the conditional will be TRUE. For example, if the ip address of the ip port is 192.168.1.1 and the destination address is 192.168.1.255:

192.168.1.255 == (192.168.1.1 | 0.0.0.255)
192.168.1.255 == 192.168.1.255

The conditional is TRUE - therefore, the destination ip address is the broadcast address.


0043265 
0043266                   r= (*ip_port->ip_dev_send)(ip_port, dstaddr, data,
0043267                                                             broadcast);
If the ip port's underlying layer is ethernet, ip_dev_send will be ipeth_send().


ipeth_send()


ipeth_send() is called (indirectly) by ip_send() to send out a packet to a destination address on the same subnet as the ip port from which it is sent or to send out a packet to a broadcast address. ipeth_send() first creates an ethernet header to prepend to the ip packet and then, if there are no packets waiting to be sent out, calls eth_send() in an attempt to send the packet to the ethernet task immediately. If eth_send() is not able to send the ethernet packet immediately, eth_write() is called to queue the packet. If there are already ethernet packets waiting to be sent out, eth_send() and eth_write() are not called and the packet is queued (i.e., placed in the de_q_head queue of the ip port).


0043268                   return r;
0043269          }
0043270 
0043271          r= oroute_frag (ip_port - ip_port_table, dstaddr, ttl, &nexthop);
The destination address is in a different network than the ip address of the ip port out of which the packet is going.


oroute_frag()


oroute_frag(port_nr, dest, ttl, nexthop) calls oroute_find_ent() to find an output route in the output route cache or the output routing table for the destination dest, oroute_frag()'s second parameter. If a route is found, the ip address of the destination's gateway is returned in a reference of nexthop, oroute_frag()'s last parameter.


0043272 
0043273          if (r == NW_OK)
0043274          {
0043275                   if (nexthop == ip_port->ip_ipaddr)
The case of the next-hop for the destination ip address being the same as the ip address of the ip port is very similar to the local loopback case. Therefore, this block is very similar to the block that begins on line 43212.

It's a little unclear when someone would set the next-hop of a route to the ip address of the ip port.


0043276                   {
0043277                            data->acc_ext_link= NULL;
0043278                            if (ip_port->ip_loopb_head == NULL)
0043279                            {
0043280                                     ip_port->ip_loopb_head= data;
0043281                                     arg.ev_ptr= ip_port;
0043282                                     ev_enqueue(&ip_port->ip_loopb_event,
0043283                                              ip_process_loopb, arg);
event_t / ev_enqueue() / ev_process() / ev_init() / ev_in_queue()

The event_t typedef is declared in inet/generic/event.h:

typedef struct event

{
ev_func_t ev_func;
ev_arg_t ev_arg;
struct event *ev_next;
} event_t;
If an event needs to be scheduled, ev_enqueue() is called to place the event in the system-wide event queue whose head is ev_head. ev_process() is eventually called from the main loop in inet.c to process the events. ev_in_queue(ev) simply returns TRUE if the event ev, ev_in_queue()'s only parameter, has a non-null value for func (see below) and FALSE if func is null. In this way, ev_in_queue() determines whether the event has been configured.

ev_init(ev) simply zeroes out the ev_func and ev_next fields of the event ev, ev_init()'s only parameter.

ev_func: A function (e.g., ip_process_loopb()) that performs some task.

ev_arg:

typedef union ev_arg

{
int ev_int;
void *ev_ptr;
} ev_arg_t;
ev_arg is ev_func's argument. In the case of a packet destined for the loopback address (127.0.0.1), the argument will be the ip port associated with the ip file descriptor that is sending out the packet. In the case of a message from the ethernet task that caused a deadlock, ev_arg is a pointer to the message's destination ethernet port.

ev_next: The next event in the system-wide event queue.


0043284                            }
0043285                            else
0043286                                     ip_port->ip_loopb_tail->acc_ext_link= data;
0043287                            ip_port->ip_loopb_tail= data;
0043288                   }
0043289                   else
0043290                   {
0043291                            r= (*ip_port->ip_dev_send)(ip_port,
0043292                                     nexthop, data, /* no broadcast */ 0);
If the ip port's underlying layer is ethernet, ip_dev_send will be ipeth_send().


ipeth_send()


ipeth_send() is called (indirectly) by ip_send() to send out a packet to a destination address on the same subnet as the ip port from which it is sent or to send out a packet to a broadcast address. ipeth_send() first creates an ethernet header to prepend to the ip packet and then, if there are no packets waiting to be sent out, calls eth_send() in an attempt to send the packet to the ethernet task immediately. If eth_send() is not able to send the ethernet packet immediately, eth_write() is called to queue the packet. If there are already ethernet packets waiting to be sent out, eth_send() and eth_write() are not called and the packet is queued (i.e., placed in the de_q_head queue of the ip port).


0043293                   }
0043294          }
0043295          else
If a route for the destination ip address didn't exist, simply free the packet. If a default route is configured, this shouldn't happen.


0043296          {
0043297                   DBLOCK(0x10, printf("got error %d\n", r));
0043298                   bf_afree(data);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0043299          }
0043300          return r;
0043301 }
0043302 
0043303 PUBLIC void ip_hdr_chksum(ip_hdr, ip_hdr_len)
0043304 ip_hdr_t *ip_hdr;
0043305 int ip_hdr_len;
There doesn't seem to be much justification for making this a function.


ip_hdr_chksum()


ip_hdr_chksum() sets the ih_hdr_chk field of an ip header with the checksum of the ip header (minus the checksum field). This checksum is obtained from oneC_sum().


0043306 {
0043307          ip_hdr->ih_hdr_chk= 0;
0043308          ip_hdr->ih_hdr_chk= ~oneC_sum (0, (u16_t *)ip_hdr, ip_hdr_len);
oneC_sum()

A checksum is used to determine if errors occurred during the transmission of data. The checksum algorithm used by oneC_sum() (which is also the Internet standard) is described by RFC 1071.

Essentially, the algorithm goes through data and adds all the bytes together (using one's complement addition). The high 16 bits of the resulting 32 bit value is then added to the low 16 bits (again, using one's complement addition). The checksum field is then set to the one's complement of this 16 bit sum. (Recall that the one's complement of 0xF0F0 is 0x0F0F.) Since AND'ing any 16 bit number and its 16 bit one's complement will equal 0xFFFF, the checksum of the packet (without the checksum field) AND'ed with the checksum field will equal 0xFFFF (provided the packet was not corrupted after the checksum field was calculated). For example, the checksum of a udp header (including the checksum field) will equal 0xFFFF if the packet was not corrupted in delivery.


From RFC 1071:

In outline, the Internet checksum algorithm is fairly simple:

(1) Adjacent octets to be checksummed are paired to form 16-bit
integers, and the 1's complement sum of these 16-bit integers is
formed.

(2) To generate a checksum, the checksum field itself is cleared,
the 16-bit 1's complement sum is computed over the octets
concerned, and the 1's complement of this sum is placed in the
checksum field.

(3) To check a checksum, the 1's complement sum is computed over the
same set of octets, including the checksum field. If the result
is all 1 bits (-0 in 1's complement arithmetic), the check
succeeds.

Below is a "C" code algorithm that describes the process above. This algorithm is also from RFC 1071. Note that count is the running count of all the bytes in the data and checksum is the return value.



{
/* Compute Internet Checksum for "count" bytes
* beginning at location "addr".
*/
register long sum = 0;

while( count > 1 ) {
/* This is the inner loop */
sum += * (unsigned short) addr++;
count -= 2;
}

/* Add left-over byte, if any */
if( count > 0 )
sum += * (unsigned char *) addr;

/* Fold 32-bit sum to 16 bits */
while (sum>>16)
sum = (sum & 0xffff) + (sum >> 16);

checksum = ~sum;
}



0043309 }
0043310 
0043311 PUBLIC acc_t *ip_split_pack (ip_port, ref_last, first_size)
0043312 ip_port_t *ip_port;
0043313 acc_t **ref_last;
0043314 int first_size;
ip_split_pack()

ip_split_pack(ip_port, ref_last, first_size) is called by ipeth_restart_send() to split up an ip packet into fragments if the packet is too large (i.e., the packet is greater than the maximum ethernet packet size without the header (1514-14 bytes)). ip_split_pack() returns a reference to the first fragment and returns a reference to the second fragment in ref_last, ip_split_pack()'s second parameter. ref_last is also used to pass in the packet that is to be split.


0043315 {
0043316          int pack_siz;
0043317          ip_hdr_t *first_hdr, *second_hdr;
ip_split_pack() extracts the packets header and examines the fields closely. For this reason, a solid understanding of an ip header's fields is beneficial.


ip_hdr_t


struct ip_hdr_t is the structure of an ip header. "ih" (e.g., ih_src, ih_dst) stands for "Ip Header".

ip_hdr_t is declared in /include/net/gen/ip_hdr.h:

typedef struct ip_hdr

{
u8_t ih_vers_ihl, ih_tos;
u16_t ih_length, ih_id, ih_flags_fragoff;
u8_t ih_ttl, ih_proto;
u16_t ih_hdr_chk;
ipaddr_t ih_src, ih_dst;
} ip_hdr_t;

ih_vers_ihl: The lower 4 bits is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits is the version number (e.g., IPv4).


ih_tos: tos stands for "Type Of Service" and is the priority of the ip packet. A value of zero is the lowest priority. Both UDP and TCP have a default TOS of zero.

#define TCP_DEF_TOS 0
#define UDP_TOS 0


ih_length: The length of the entire ip packet, including the ip header.


ih_id: The value of ih_id for the first packet sent out is determined by ip_init() and is equal to the number of clock ticks since reboot (i.e., the value returned by get_time) and is incremented for each packet sent out. This value is used to combine fragments at the receiving end if fragmentation has occurred.


ih_flags_fragoff: ih_flags_fragoff is a combination of flags and a (possible) fragmentation offset ("fragoff").

If the packet should not be fragmented, ih_flags_fragoff is set to IH_DONT_FRAG. If there are additional fragments (e.g., the 3rd fragment of 4 fragments), ih_flags_fragoff is set to IH_MORE_FRAGS.

If the packet is indeed just a fragment of a packet, this value indicates the starting byte position (in 8 byte increments) of the original ip packet's data. So for example, if an ip packet of data size (not including the ip header) is broken up into two fragments of 1496 and 504 bytes each, the first fragment would have a fragmentation offset of 0 bytes and the second fragment would have a fragmentation offset of 1496 bytes and ih_flags_fragoff is therefore 187 (1496 / 8 = 187).


ih_ttl: "Time to live" for the packet. As a packet is routed to the destination, each router decrements the packet's ttl. When the ttl reaches 0, the router sends an "icmp unreachable" packet to the source. The ttl is designed to prevent packets that can't reach their destination from indefinitely bouncing around between routers. UDP's default TTL is 30:

#define UDP_TTL 30

Note that the Minix code also uses this value as a timeout value (in seconds). This code was written before the ttl field was redefined to be strictly a hope count. The original IP RFC defines the ttl field as the time to live in seconds.


ih_proto: The protocol of the ip packet. For example, if the packet is a udp packet, ih_proto will be 17. If the packet is a tcp packet, ih_proto will be 6.


ih_hdr_chk: Checksum for the header.


ih_src, ih_dst: Source and destination ip address of the ip packet.


IP HEADER (as given by RFC 791)


0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



0043318          int first_hdr_len, second_hdr_len;
0043319          int first_data_len, second_data_len;
0043320          int new_first_data_len;
0043321          int first_opt_size, second_opt_size;
0043322          acc_t *first_pack, *second_pack, *tmp_pack, *tmp_pack1;
0043323          u8_t *first_optptr, *second_optptr;
0043324          int i, optlen;
0043325 
0043326          first_pack= *ref_last;
The packet to be split is passed by reference by ref_last.


0043327          *ref_last= 0;
0043328          second_pack= 0;
0043329 
0043330          first_pack= bf_packIffLess(first_pack, IP_MIN_HDR_SIZE);
Pack the ip header so that the fields can be extracted.


bf_packIffLess()


If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0043331          assert (first_pack->acc_length >= IP_MIN_HDR_SIZE);
0043332 
0043333          first_hdr= (ip_hdr_t *)ptr2acc_data(first_pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043334          first_hdr_len= (first_hdr->ih_vers_ihl & IH_IHL_MASK) * 4;
The lower 4 bits of ih_vers_ihl is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits of ih_vers_ihl is the version number (e.g., IPv4).


0043335 
0043336          pack_siz= bf_bufsize(first_pack);
bf_bufsize()

bf_bufsize() returns the total buffer size of a linked list of accessors (i.e., the sum of acc_length for the accessors in a linked list).

For a detailed description of the network service's buffer management, click here.


0043337          assert(pack_siz > first_size);
0043338 
0043339          if (first_hdr->ih_flags_fragoff & HTONS(IH_DONT_FRAG))
ip_split_pack() attempts to split the packet into fragments. If the ip header specifies that the packet should not be fragmented (i.e., the header's IH_DONT_FRAG flag is set), there's a problem and so the source of the packet is sent an icmp packet.

IH_DONT_FRAG is #define'd in include/net/gen/ip_hdr.h:
#define IH_DONT_FRAG 0x4000


0043340          {
0043341                   icmp_snd_unreachable(ip_port->ip_port, first_pack,
0043342                            ICMP_FRAGM_AND_DF);
Send the source of the packet an icmp packet.

ICMP_FRAGM_AND_DF is #define'd in inet/generic/icmp.h:

# define ICMP_FRAGM_AND_DF 4


icmp_snd_unreachable()


icmp_snd_unreachable(port_nr, pack, code) builds an icmp unreachable packet (partially using the ip packet pack, icmp_snd_unreachable()'s second parameter) and then places the icmp unreachable packet in the outgoing queue. Icmp unreachable packets are sent if the network, host, or port number specified by the ip packet pack is unreachable.

The function first calls icmp_err_pack() to build a generic icmp packet, sets the ih_type field of the icmp header to ICMP_TYPE_DST_UNRCH and recalibrates the checksum (since the type and code fields of the icmp header have changed) of the icmp header before placing the packet in the icmp port's write queue.


0043343                   return NULL;
0043344          }
0043345 
Lines 43346-43355 split up a single buffer containing the packet into two buffers. In lines 43356-43360, an ip header identical to the ip header for the original packet is created and prepended to the second buffer.


0043346          first_data_len= ntohs(first_hdr->ih_length) - first_hdr_len;
first_data_len is the length of the payload of the ip packet. In other words, it's the length of the packet excluding the ip header.


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0043347          new_first_data_len= (first_size- first_hdr_len) & ~7;
0043348                   /* data goes in 8 byte chuncks */
Since the Fragmentation Offset field in an ip header is not the fragmentation offset but actually the fragmentation offset divided by 8, each fragment must be a multiple of 8. Therefore, the fragmentation offset is rounded down to the next multiple of 8.


0043349          second_data_len= first_data_len-new_first_data_len;
second_data_len is the remainder of the packet.


0043350          second_pack= bf_cut(first_pack, first_hdr_len+
0043351                   new_first_data_len, second_data_len);
bf_cut()

If a section of a linked list needs to be duplicated, bf_cut(data, offset, length) is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



Note that the original linked list remains unchanged and that acc_linkC for all the accessors in the new linked list is one.

If length (the second parameter) is zero, simply duplicate the first accessor in the linked list but set acc_length=0 and acc_next=null. In other words, create a linked list of length one accessor whose acc_length is 0.

bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size.

For a full description of the network service's buffer management, click here.



0043352          tmp_pack= first_pack;
0043353          first_data_len= new_first_data_len;
0043354          first_pack= bf_cut (tmp_pack, 0, first_hdr_len+first_data_len);
bf_cut()

If a section of a linked list needs to be duplicated, bf_cut(data, offset, length) is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



Note that the original linked list remains unchanged and that acc_linkC for all the accessors in the new linked list is one.

If length (the second parameter) is zero, simply duplicate the first accessor in the linked list but set acc_length=0 and acc_next=null. In other words, create a linked list of length one accessor whose acc_length is 0.

bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size.

For a full description of the network service's buffer management, click here.



0043355          bf_afree(tmp_pack);
bf_cut()

If a section of a linked list needs to be duplicated, bf_cut(data, offset, length) is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



Note that the original linked list remains unchanged and that acc_linkC for all the accessors in the new linked list is one.

If length (the second parameter) is zero, simply duplicate the first accessor in the linked list but set acc_length=0 and acc_next=null. In other words, create a linked list of length one accessor whose acc_length is 0.

bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size.

For a full description of the network service's buffer management, click here.



0043356          tmp_pack= bf_memreq(first_hdr_len);
bf_memreq()

After the buffers have been initialized, accessors[] looks like the following:



bf_memreq() allocates accessors to the caller. For example, if 1514 bytes of buffer space are requested immediately after the network process starts and each buffer is 512 bytes (the default), then accessors[] will look like the following:



Note that three elements of accessors[] have been removed from buf512_freelist and that the head of the chain of the 3 accessors is returned by bf_memreq(). Also note that the acc_linkC and buf_linkC fields have been set to one and acc_length and acc_offset have been set to their appropriate values.

So what happens if there are not enough buffers on the buf512_freelist to satisfy a request? On lines 2280-2290 of buf.c, functions that free buffers for the specific clients (e.g., eth_buffree()) are called until there are enough buffers on buf512_freelist.

For a complete description of the network service's buffer management, click here.


0043357          tmp_pack->acc_next= second_pack;
0043358          second_pack= tmp_pack;
0043359          second_hdr= (ip_hdr_t *)ptr2acc_data(second_pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043360          *second_hdr= *first_hdr;
The header for the first packet is now identical to the header for the second packet.


0043361          second_hdr->ih_flags_fragoff= htons(
0043362                   ntohs(first_hdr->ih_flags_fragoff)+(first_data_len/8));
ih_flags_fragoff is a combination of flags and a (possible) fragmentation offset ("fragoff").

If the packet should not be fragmented, ih_flags_fragoff is set to IH_DONT_FRAG. If there are additional fragments (e.g., the 3rd fragment of 4 fragments), ih_flags_fragoff is set to IH_MORE_FRAGS.

If the packet is indeed just a fragment of a packet, this value indicates the starting byte position (in 8 byte increments) of the original ip packet's data. So for example, if an ip packet of data size (not including the ip header) is broken up into two fragments of 1496 and 504 bytes each, the first fragment would have a fragmentation offset of 0 bytes and the second fragment would have a fragmentation offset of 1500 bytes and ih_flags_fragoff is therefore 187 (1496 / 8 = 187).


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0043363 
The second fragment still does not have any ip options (if they exist). Line 43366 tests whether there are any ip options and, if they do exist, the following block copies the options from the first fragment to the second fragment.


0043364          first_opt_size= first_hdr_len-IP_MIN_HDR_SIZE;
0043365          second_opt_size= 0;
0043366          if (first_opt_size)
If ip options are present, create a buffer for the fragments and copy them over from the first fragment. This buffer is then appended to the ip header for the second fragment.


0043367          {
Lines 43368-43373 obtain a pointer (first_optptr) to the first fragment's ip options.


0043368                   first_pack= bf_packIffLess (first_pack,
0043369                            first_hdr_len);
bf_packIffLess()

If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0043370                   first_hdr= (ip_hdr_t *)ptr2acc_data(first_pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043371                   assert (first_pack->acc_length>=first_hdr_len);
0043372                   first_optptr= (u8_t *)ptr2acc_data(first_pack)+
0043373                            IP_MIN_HDR_SIZE;
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043374                   second_optptr= (u8_t *)ptr2acc_data(
0043375                            second_pack)+IP_MIN_HDR_SIZE;
second_optptr points immediately after the (minimal) ip header, to where the first fragment's ip options will be copied.


ptr2acc_data()


The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0043376                   i= 0;
0043377                   while (i<first_opt_size)
If an ip packet is split, an ip option is copied to the fragments if the ip option's copy flag is set. The copy flag is the first bit of the option.


ip options


Ip options are an assortment of settings related to the ip protocol. However, most ip options involve the path that the ip packet takes to its destination. Ip options are discussed in detail on page 14 of RFC 791 as well as by the Data Network Resource.

Ip options are optional. If there are ip options, the ip options must be placed between the ip header and the udp header (for a udp packet):



The maximum size of the ip options is 40 bytes and must be a multiple of 4 bytes (RFC 791). This is a real limitation since the number of router ip addresses that can be stored in the ip options is very limited.


0043378                   {
0043379                            switch (*first_optptr & IP_OPT_NUMBER)
IP_OPT_NUMBER is defined in /include/net/gen/ip_hdr.h as:
#define IP_OPT_NUMBER 0x1f

0x1f is binary 11111. It represents the lower five bits of the Option-Type field of an IP option, which identify the specific option.



0043380                            {
0043381                            case 0:
0043382                            case 1:
0043383                                     optlen= 1;
0043384                                     break;
Options 0 and 1 (the "end of option list" option and the "no-op" option) have lengths of 0.


0043385                            default:
0043386                                     optlen= first_optptr[1];
0043387                                     break;
For the other ip options, the second byte is set to the length of the option.


0043388                            }
0043389                            assert (i + optlen <= first_opt_size);
0043390                            i += optlen;
0043391                            if (*first_optptr & IP_OPT_COPIED)
IP_OPT_COPIED is defined in /include/net/gen/ip_hdr.h as:

#define IP_OPT_COPIED 0x80

0x80 is binary 10000000. If the first bit of an option is 1, the option should be copied to the headers of any IP fragments that may be generated.


0043392                            {
0043393                                     second_opt_size += optlen;
0043394                                     while (optlen--)
0043395                                              *second_optptr++=
0043396                                                     *first_optptr++;
Copy each byte of the ip option from the original ip packet to the fragment.


0043397                            }
0043398                            else
0043399                                     first_optptr += optlen;
0043400                   }
0043401                   while (second_opt_size & 3)
Ensure (by padding with 0's) that the length of the ip options is a multiple of 4. This is required by RFC 791.


0043402                   {
0043403                            *second_optptr++= 0;
0043404                            second_opt_size++;
0043405                   }
0043406          }
0043407          second_hdr_len= IP_MIN_HDR_SIZE + second_opt_size;
Add the length of its ip options (if there were any) to the length of the second fragment's ip header.


0043408 
Lines 43407-43412 modify the ih_vers_ihl (length of the ip header including options) and ih_length (length of the total packet). Line 43413 modifies the acc_length field of the accessor that holds the packet.


0043409          second_hdr->ih_vers_ihl= second_hdr->ih_vers_ihl & 0xf0
0043410                   + (second_hdr_len/4);
0043411          second_hdr->ih_length= htons(second_data_len+
0043412                   second_hdr_len);
htons() / ntohs() / htonl() / ntohl()

From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0043413          second_pack->acc_length= second_hdr_len;
0043414 
0043415          assert(first_pack->acc_linkC == 1);
0043416          assert(first_pack->acc_buffer->buf_linkC == 1);
0043417 
Line 43418 sets the IH_MORE_FRAGS flag for the ip header (after all, there will be a second fragment). Note that this flag is not set for the second fragment. Line 43419 calculates the new total length of the first fragment.


0043418          first_hdr->ih_flags_fragoff |= HTONS(IH_MORE_FRAGS);
0043419          first_hdr->ih_length= htons(first_data_len+
0043420                   first_hdr_len);
0043421          assert (!(second_hdr->ih_flags_fragoff & HTONS(IH_DONT_FRAG)));
0043422 
0043423          ip_hdr_chksum(first_hdr, first_hdr_len);
ip_hdr_chksum()

ip_hdr_chksum() sets the ih_hdr_chk field of an ip header with the checksum of the ip header (minus the checksum field). This checksum is obtained from oneC_sum().


0043424          if (second_data_len+second_hdr_len <= first_size)
0043425          {
0043426                   /* second_pack will not be split any further, so we have to
0043427                    * calculate the header checksum.
0043428                    */
0043429                   ip_hdr_chksum(second_hdr, second_hdr_len);
ip_hdr_chksum()

ip_hdr_chksum() sets the ih_hdr_chk field of an ip header with the checksum of the ip header (minus the checksum field). This checksum is obtained from oneC_sum().


0043430          }
0043431 
ip_split_pack() returns the first fragment and the second fragment is returned in ref_last, the second parameter to ip_split_pack().


0043432          *ref_last= second_pack;
0043433          return first_pack;
0043434 }
0043435 
0043436 PRIVATE void error_reply (ip_fd, error)
0043437 ip_fd_t *ip_fd;
0043438 int error;
error_reply() / ip_write

error_reply(ip_fd, error) simply calls the function referenced by the ip file descriptor ip_fd's (error_reply()'s first parameter) associated ip port's if_get_userdata field.

If a process opened up an ip device file (e.g., /dev/ip) directly and therefore opened up the ip file descriptor ip_fd, if_get_userdata was set to sr_get_userdata(). If this is the case, sr_get_userdata() simply sends the file system an error code or the number of bytes written.

If the udp code opened up the ip file descriptor ip_fd, if_get_userdata was set to udp_get_data(). If this is the case, udp_get_data() simply removes the packet from the up_wr_pack field and turns off the write flag for the udp file descriptor. Note that in this scenario, error (error_reply()'s second parameter) is not used.

If the icmp code opened up the ip file descriptor, if_get_userdata was set to icmp_getdata().

Note that error_reply() is somewhat of a misnomer. Even if an error did not occur, error_reply() is called.


0043439 {
0043440          if ((*ip_fd->if_get_userdata)(ip_fd->if_srfd, (size_t)error,
0043441                   (size_t)0, FALSE))
if_get_userdata is the higher-layer "get data" function. For example, when the udp code opens an ip file descriptor (by calling ip_open()), the if_get_userdata field is set to udp_get_data() and the if_srfd is set to the ip file descriptor's associated udp port. udp_get_data() returns the packet that was obtained from the process requesting the write.

For the icmp layer, if_get_userdata is set to icmp_getdata(). Again, icmp_getdata() returns the packet that was obtained from the process requesting the write.


udp_get_data()


udp_get_data() is called only indirectly by the ip code. udp_get_data() has a number of uses, which makes it a somewhat difficult function to understand.

During the initialization of a udp port (when the state of the udp port is UPS_SETPROTO), udp_get_data() is called twice. The first time udp_get_data() is called, the ip code tries to initialize the udp port's underlying ip file descriptor. udp_get_data() is called the second time (through reply_thr_get()) to change the state of the udp port to UPS_GETCONF.

After initialization (when the state of the udp port is UPS_MAIN), udp_get_data() is called to send either packets or configuration data to the underlying layer (i.e., the ip layer). For a write to a udp file descriptor, a packet is placed in the udp file descriptor's associated udp port's write field and udp_get_data() is then called by ip_write() to move the data to the ip layer. If ip_ioctl() is called to configure a udp port, ip_ioctl() calls udp_get_data() to get configuration data for the port (including, for example, the ip address of the port).

After a write operation is called, udp_get_data() is called a second time to clear some of the udp port's fields in preparation for the next write.


0043442          {
0043443 #if !CRAMPED
0043444                   ip_panic(( "can't error_reply" ));
0043445 #endif
0043446          }
0043447 }
0043448 
0043449 /*
0043450  * $PchId: ip_write.c,v 1.7.1.1.1.1 2001/01/22 19:59:07 philip Exp $
0043451  */