CSC4303 Network Programming

Reference

https://book.systemsapproach.org/foundation.html

Network Model

Layer	Description	Example
Application	Programs that use network service	HTTP, DNS, CDNs
Transport	Provides end-to-end data delivery	TCP, UDP
Network	Send packets over multiple networks	IP, NAT, BGP
Link	Send frames over one or more links	Ethernet, 802.11
Physical	Send bits using signals	wires, fiber, wireless

Transport Layer

UDP

Connectionless

Each packet is independent.

Sender    Time       Receiver
  |                     |
  |----- Packet1 -----> |
  |                     |
  |----- Packet2 -----> |
  |                     |

Buffering

UDP buffering is a "temporary storage area" maintained by the operating system for each UDP port, where arriving packets queue up when applications can't process them immediately.

Application A  Application B   Application C
    ↓               ↓               ↓
[Port X]        [Port Y]        [Port Z]      ← Port mapping
    ↓               ↓               ↓
[Queue 1]       [Queue 2]       [Queue 3]     ← Independent UDP message queues
    │               │               │
    └───────┬───────┴───────┬───────┘
            ↓               ↓
    [Port Multiplexer/Demultiplexer]          ← Routes by port number
            ↓
    [Incoming UDP packets]

Header

Note that the Datagram length up to 64K.

     32-bit width (4 bytes per row)
0                  16                31
┌──────────────────┬──────────────────┐
│ Source Port      │ Destination Port │ ← Port addressing
│   (16 bits)      │   (16 bits)      │
├──────────────────┴──────────────────┤
│    Length        │    Checksum      │ ← Size & integrity
│    (16 bits)     │    (16 bits)     │
├─────────────────────────────────────┤
│                                     │
│           Application Data          │
│                                     │
└─────────────────────────────────────┘

TCP

Header

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        | ← Identifies sockets
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number (32)                   | ← Byte-based sequencing
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number (32)                 | ← Next expected byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data Offset  |  0  |  Flags  |       Window Size (16)        | ← Control & flow control
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Checksum (16)             |       Urgent Pointer (16)     | ← Integrity & urgent data
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         Options (variable)                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Data                                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Connection Establishment (Setup)

Three-Way Handshake

Both parties send Initial Sequence Numbers (ISNs) via SYNchronize segments. Each party acknowledges the other's sequence number using ACKnowledge segments.

Step	Initiator (Client)	Receiver (Server)	Description
First Handshake	Sends $SYN = 1, seq = x$	Waits for connection request	Client requests to establish a connection and sends its ISN $x$ .
Second Handshake	Waits for acknowledgment	Sends $SYN = 1, ACK = 1, seq = y, ack = x + 1$ It acknowledges receipt through sequence number $x$ .	Server agrees to connect, sends its own ISN $y$ , and acknowledges the receipt of $x$ .
Third Handshake	Sends $ACK = 1, seq = x + 1, ack = y + 1$	Connection established	Client acknowledges the receipt of $y$ , and the connection is formally established.

Three-way handshake prevents a server from wasting resources on stale or duplicate connection requests by requiring the client's final acknowledgment.

State Machine
- Client Path CLOSED → connect() → SYN_SENT → Receive SYN+ACK → ESTABLISHED
- Server Path CLOSED → listen() → LISTEN → Receive SYN → SYN_RCVD → Receive ACK → ESTABLISHED
- Both parties run instances of this state machine, and TCP allows for simultaneous open.

Connection Release (Teardown)

Four-Way Handshake (symmetric)

Step	Initiator (Active Closer)	Receiver (Passive Closer)	Description
First Wave	Sends $FIN = 1, seq = u$	Waits for close request	Initiator has finished sending data and requests to close its sending channel.
Second Wave	Waits for remaining data	Sends $ACK = 1, ack = u + 1$	Receiver acknowledges the close request but may still have data to send.
Third Wave	Waits for confirmation	Sends $FIN = 1, seq = v$	Receiver has finished sending data and requests to close its sending channel.
Fourth Wave	Sends $ACK = 1, ack = v + 1$	Connection closed	Initiator acknowledges the close request.

State Machine
- Active Closer Path ESTABLISHED → close() → FIN_WAIT_1 → Receive ACK → FIN_WAIT_2 → Receive FIN → TIME_WAIT → Timeout → CLOSED
- Passive Closer Path ESTABLISHED → Receive FIN → CLOSE_WAIT → close() → LAST_ACK → Receive ACK → CLOSED
- TIME_WAIT State
  - $2 \times MSL$ (Maximum Segment Lifetime).
  - Lost ACKs can be recovered.
  - Old segments won't confuse new connections.

Flow Control

Automatic Repeat Query

ARQ with one message at a time is Stop-and-Wait. It allows only a single message to be outstanding from sender.
Sliding Window

It allows $W$ outstanding packets, enabling pipelining to send multiple packets per RTT for improved performance.

Sender buffers up to $W$ segments until they are acknowledged. The last frame sent minus last ack rec'd should no more than $W$ .
- Go-Back-N
  
  Only buffers next expected packet (LAS). Accepts only sequential packets, discards others, sends cumulative ACK.
- Selective Repeat
  
  Buffers entire window. Stores out-of-order packets, sends individual ACKs, retransmits only lost packets.
- Sequence Number
  
  $n$ bit counter wraps around at $2^{n} - 1$ . Let $LAR$ be Last Acknowledgement Received, $LAS$ be Last Acknowledgement Sent.
  
  Method Sender's Range Receiver's Range Min Number Needed to Avoid Overlap
  
  Go-Back-N $[LAR + 1, LAR + W]$ $LAS + 1$ $W + 1$
  
  Selective Repeat $[LAR + 1, LAR + W]$ $[LAS + 1, LAS + W]$ $2 W$

Method	Sender's Range	Receiver's Range	Min Number Needed to Avoid Overlap
Go-Back-N	$[LAR + 1, LAR + W]$	$LAS + 1$	$W + 1$
Selective Repeat	$[LAR + 1, LAR + W]$	$[LAS + 1, LAS + W]$	$2 W$

Pacing

ACK Clocking (sender)

ACK clocking is a feedback mechanism where the network itself determines the sending pace, preventing queue buildup and ensuring efficient, low-latency data flow.
Flow Control (Receiver)

Flow control uses the WIN field, calculated as WIN = ReceiveBuffer - (LastByteRcvd - LastByteRead), to dynamically limit the sender's window, preventing receiver buffer overflow.

$SEQ + length < ACK + WIN$

Adaptive Timeout

Name	Formula
$SRTT$ (average round‑trip time)	$SRTT_{n + 1} = (1 - α) \cdot SRTT_{n} + α \cdot RTT_{n + 1}$
$Svar$ (variability of RTT)	$Svar_{n + 1} = (1 - β) \cdot Svar_{n} + β \cdot ∣ RTT_{n + 1} - SRTT_{n + 1} ∣$
$Timeout$	$Timeout_{n} = SRTT_{n} + 4 \cdot Svar_{n}$

Congestion Control TCP congestion control uses a sliding window (cwnd), interprets packet loss as a congestion signal, and adjusts the window via AIMD to achieve an efficient and roughly fair bandwidth allocation.
- Max-Min fairness
  
  Increse the rate of one flow will decrease the rate of a smaller flow.
  
  Step 1 Initialize all flows at zero
  Step 2 Increase all flows equally
  Step 3 Freeze bottlenecked flows
  Step 4 Repeat for remaining flows
- Bandwidth allocation
  
  Network layer provides direct feedback & Transport layer reduces load [Network is distributed, no single party has an overall picture of its state.]
  - Models
    - Open loop [reserve] & Closed loop [use feedback to adjust rates]
    - Host support & Network support
    - Window based & Rate based
    TCP is a closed loop,host-driven, and window-based
  - AIMD(Additive Increase Multiplicative Decrease)
    - Hosts additively increase rate while network not congested
    - Hosts multiplicatively decrease rate when congested
- TCP Tahoe
  - Slow Start
    
    For each ACK received: $cwnd = cwnd + 1$ , $cwnd$ doubles every RTT.
  - Later Additive Increase (AI)
    
    For each ACK received: $cwnd \leftarrow cwnd + \frac{1}{cwnd}$ , roughly adds 1 packet per RTT.
  - Switching Threshold (Initially Infinity)
    
    Switch to AI when When $cwnd > ssthresh$ , and $ssthresh = \frac{cwnd}{2}$ after loss. Begin with slow start after timeout ( $cwnd = 1$ ).
  - Fast Retransmit
    
    TCP's cumulative ACKs allow duplicate ACKs to signal lost packets for fast retransmission.(Treat three duplicate ACKs as a loss.)
  - Fast Recovery (TCP Reno)
    
    Fast recovery keeps data flowing during retransmission by maintaining the ACK clock. It avoids resetting to Slow Start after packet loss. Set $ssthresh = \frac{cwnd}{2}$ and then continue AI directly.

UDP & TCP

Feature	TCP	UDP
Mode	Connections	Datagrams
Reliability	No loss, no duplicates, in-order delivery	May lose, reorder, or duplicate packets
Data Size	Unlimited	Limited
Flow Control	Flow control matches sender to receiver	Send regardless of receiver state
Congestion Control	Congestion control matches sender to network	Send regardless of network state

Socket

An endpoint for network communication that allows an application to attach to a specific port on the local network interface, enabling data exchange with other applications across the network.

Process

Cilent socket () ------------------------> connect () -> I/O -> close ()

Server socket () -> bind () -> listen () -> accept () -> I/O -> close ()
1. The server blocks in accept() on listenfd for incoming connections.
2. The client calls connect() to initiate the TCP handshake, which also blocks.
3. Upon completion, the server's accept() returns a connfd and the client's connect() returns, establishing a bidirectional channel between clientfd and connfd.
Create int socket (int domain, int type, int protocol);
- domain AF_INET IPv4 / AF_INET6 IPv6.
- type SOCK_STREAM TCP / SOCK_DGRAM UDP.
- protocol 0 (automatically select the default protocol).
Bind int bind (int sockfd, sockaddr *addr, socklen_t addrlen);

When a server binds a socket to a specific address, it establishes that data arriving at that address is read from this socket, and data written to this socket is sent from that address.
- socket address
```
struct sockaddr
{
    uint_16 ss_family; // protocol family
    char ss_data[14]; // address data
}
```
- sockaddr_in
```
struct sockaddr_in
{
    uint16_t sin_family; // always AF_INET
    uint16_t sin_port; // in network byte order
    struct in_addr sin_addr; // in network byte order
    unsigned char sin_zero[8]; // pad to sizeof (struct sockaddr), placeholder
}
```
  Network data is transmitted in big-endian byte order. Use htons() and htonl() to convert host-ordered values into network-ordered short and long integers, respectively.
Listen int listen (int sockfd, int backlog);

listen () transforms a socket descriptor into a listening socket capable of accepting client connections. The backlog parameter suggests how many pending connections the kernel may queue before refusing new requests (typically around 128).
Accept int accpet (int listenfd, sockaddr *addr, int *addrlen);

accept() blocks until a client connects via listenfd, stores the client's address in addr, and returns a connfd for communication using standard Unix I/O functions.

Connect int connect (int clientfd, sockaddr *addr, socklen_t addrlen);

Attempts to establish a connection with server at socket address addr.

To convert a string-formatted IP address (e.g., SERVER_IP), use inet_pton (int af, const char *src, void *dst) to transform it into binary network format.

Feature	`listenfd`	`connfd`
Purpose	Accepts connection requests	Handles data exchange with a client
Creation	Once at server start	Per accepted connection
Lifetime	Entire server runtime	Duration of client service
Quantity	One per port	One per active client
Concurrent Use	Listens for all clients	Dedicated to single client

Send and Receive

ssize_t read (int fd, void *buf, size_t len); returns the number of bytes actually read (0 indicates the connection is closed, -1 indicates an error).

ssize_t write (int fa, const void *buf, size_t len); returns the number of bytes actually written (-1 indicates an error).
Close int close (int sockfd)
Concurrency
- Threading/Process Race conditions increase complexity!
- I/O Multiplexing
  - select()
  - poll()
    
    poll() is a system call used to monitor multiple file descriptors concurrently for I/O readiness. Unlike select(), it imposes no fixed limit on the number of file descriptors that can be monitored.
```
struct pollfd
{
    int fd; // file descriptor
    short events; // requested events
    short revents; // returned events
}   
```

Network Layer

Internet Protocol (IP)

IPV4 Header

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL   |Type of Service|        Total Length (16)      | ← Basic identification (IHL : Header)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Identification (16)    |Flags|   Fragment Offset (13)  | ← Fragmentation control
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |    Protocol   |        Header Checksum (16)   | ← Routing & integrity (Time to Live : TTL)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Source Address (32)                     | ← Sender IP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Destination Address (32)                   | ← Receiver IP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options (if any, variable)                 | ← Optional features
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Payload (Data)                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

IP Addresses An IP address is assigned to each network interface. Consequently, routers possess multiple interfaces, while most hosts have just one or two (wired and wireless).
IP Prefixes In a prefix of length L, the top L bits are identical across all addresses. In CIDR notation, such a prefix is denoted as IP address/prefix length (e.g. 128.13.0.0/16 is 128.13.0.0 to 128.13.255.255.)
IP Forwarding It uses longest prefix matching to select the most specific route.

IP fragmentation

MTU Max Transfer Size
MSS Maximum Segment Size

Each fragment is encapsulated with its own IP header.

Field	Size (bits)	Purpose
Identification	16	Unique identifier for the original datagram; all fragments of the same datagram share this value.
Flags	3	Control bits for fragmentation DF (Don't Fragment) If set 1, routers must not fragment the packet. MF (More Fragments) If set 1, more fragments follow, otherwise indicates that it is the last fragment.
Fragment Offset	13	Indicates the position of this fragment's data in 8‑byte units relative to the start of the original datagram.

Dynamic Host Configuration Protocol (DHCP) [Application Layer]

Step	Message	IP Header (Src → Dst)	Ethernet (Src → Dst)	Purpose
1	DISCOVER	`0.0.0.0` → `255.255.255.255`	`Client MAC` → `FF:FF:FF:FF:FF:FF`	Client broadcasts to find DHCP servers
2	OFFER	`Server's IP` → `255.255.255.255`	`Server MAC` → `FF:FF:FF:FF:FF:FF`	Server proposes IP configuration
3	REQUEST	`0.0.0.0` → `255.255.255.255`	`Client MAC` → `FF:FF:FF:FF:FF:FF`	Client accepts the offer
4	ACK	`Server's IP` → `255.255.255.255`	`Server MAC` → `FF:FF:FF:FF:FF:FF`	Server confirms and finalizes lease

Address Resolution Protocol (ARP)

Every network device possesses a unique MAC (Media Access Control) address, which operates at the link layer.

ARP resolves IP to MAC addresses locally. A node broadcasts an ARP request, receives a private reply, and caches the mapping in its ARP table.

Internet Control Message Protocol (ICMP)

When an error occurs, an ICMP error report is sent back to the source IP address and the problematic packet is discarded. The source host must then rectify the issue.

Message Format
- IP Header Src = router, Dst = A, Protocol = 1
- ICMP Header Type = X, Code = Y
- ICMP Data Src = A, Dst = B, ...
Type

Name	Code	Usage
Dest. Unreachable (Net or Host)	`3/0` or `3/1`	Lack of connectivity
Dest. Unreachable (Fragment)	`3/4`	Path MTU Discovery
Time Exceeded (Transit)	`11/0`	Traceroute ( $TTL = 0$ )
Echo Request or Reply	`8/0` or `0/0`	Ping

Traceroute

Traceroute sends probe packets with incrementing TTL. Each router that decrements TTL to zero replies with an ICMP Time Exceeded error, revealing its address.

Network Address Translation (NAT)

NAT maps multiple private IP:port pairs to a single public IP with unique external ports via a stateful translation table, enabling many internal devices to share one external address.

CUHKSZ CSC Notes