Networking mini tutorial

Here follows a mini crash-course on networking related items which are used in our course but are not at all microcontroller specific.

Networking allows devices connected to the network exchange information. Measurements from some remote sensor can be sent to a computer for analysis, data from a remote database can be requested, or, more general, data from any remote service connected to the internet can be requested.

For this the data need to be packed together with some meta-information which defines where the data has to be sent. This packaging and the data transfer itself is done by the networking software and hardware. The software implements a set of protocols which make the above possible.

Networking layers

Data is sent around the internet in packets of limited size. This avoids that one gigantic transfer can block a network line for a long time. Having packets of limited size makes it possible that on the same physical line packets from different transfers can be sent alternatively so that multiple transfers are ongoing at the same time. How the packets are transferred to their destination is defined by a set of networking protocols. These protocols are "rules" how the data has to be packaged. Each data packet starts with some header containing meta-information needed by the protocol to its job. Usually multiple protocols are involved in the transfer and hence multiple headers are chained one after the other at the start of each data packet. Each protocol performs a different function. The figure below shows a typical TCP/IP packet on an Ethernet network. Three protocols (TCP,IP and Ethernet) are involved and hence three headers are packed in front of the "payload data", i.e. the data we are really interested in. If the amount of data to be transferred is large, many of these packets need to be sent in a chain.

Typical Network Packet

Typical Network Stack

Very often networking on the internet is done with the TCP/IP protocol. These are in fact two different protocols: TCP is the so called transport layer protocol and IP is the Internet protocol. These two protocols are used on top of a so-called Data-Link layer protocol: This is the protocol which actually transfers data packets over the hardware media: it deals with the networking hardware. For this layer you have in home/office/laboratory networks Ethernet for cable (or fibre) connections, and WIFI for wireless connections.

The functions of the TCP and IP protocols are described in the following sections. They are essential for transferring network packets from the source over the internet to the destination.

IP (Internet Protocol)

The purpose of the internet is that computers can communicate with each other. For this to happen it needs to be possible to address computers so that you can choose which computer you want to "talk" to. The addresses which computers get on the Internet are called IP addresses (IP: the Internet Protocol). The Internet Protocol defines the format of the addresses of communicating devices. An address is a simple 4 byte number. By convention they are written in a way that the number which each byte represents is written in decimal form and the resulting four numbers are separated by dots (e.g. 133.231.43.136). From this you can calculate how many different addresses exist in the world (namely 8 * 8 * 8 * 8 = 4294967296). Even though this number is large there start to be not enough addresses in our world. This is why a new addressing scheme has been developed (IPv6) which allows for many more addresses. It is not yet used very widely but it is being deployed.

The internet is divided into a hierarchy (i.e. tree structure) of subnets. This avoids the need to have one world wide computing centre which is responsible to distribute all IP addresses to all network devices in the world. These subnets are connected by so called "routers". A router allows packets to travel from one subnet to the other. Routers know the address ranges of the connected subnets and with this information they can forward the packets to the correct subnet. An example for subnets you find at CERN: CERN "owns" the subnets 137.138.0.0/16 and 128.141.0.0/16: the number after the '/' defines the number of leading bits which identify the subnet. /16 means that the first 16 bits are always the same for all addresses of the subnet. (In this case this corresponds to the first two bytes 137 and 138 or 128 and 141). This means all addresses which start with 137.138.x.y or 128.141.x.y are reserved for CERN and CERN can distribute these IP addresses to any internal device. In principle these devices are then accessible from anywhere in the world (we do not consider the existence of firewalls here...) The distribution of the addresses in these "subnets" is handled by the IT division of CERN. However, in the outside world the big routers of the Internet know that these subnets are CERN subnets and they forward packets which have addresses in the ranges above to CERN routers (these are devices with a known IP address on the internet). Within CERN they are then distributed by the internal "routers" and "switches" to the destination. (Opposed to a router which connects different subnets, a "switch" connects devices in the same subnet). There is no need for the all routers in the world to know the address of the PC in my office. This is the same concept as routing traditional mail. First the letters are brought to the country, within the country to the city and finally to the precise destination address.

The following (dramatically simplified) figure shows the principle, of how the Internet is structured into a hierarchy of subnets:

The (simplified) structure of the Internet

The internet is divided into a set of large subnets, each of which is handled by a Network Service Provider (NSP). Packets can be transferred from one of the large subnets to the other via IXPs (Internet Exchange Points). IXPs are implemented by NAPs (Network Access Point) or MAEs (Metropolitan Area Exchange). Both are large network routers which allow packets to be transferred from one NSP to the other. (MAEs are commercial services whereas NAPs are public services.) Each NSP subnet is divided into several regional ISPs (Internet Service Providers) which in turn are divided into several local ISPs. So each of these nodes (ISPs or NSPs) are on top of a large subnet. These nodes have many links "downwards" each of which corresponds to a "subnet of the large subnet". The same is valid for all nodes of the hierarchy. In addition each node has an "uplink" to a higher level ISP or NSP (or to several NAPs/MAEs at the highest level). (The whole thing is a tree diagram where each node has one (or some) uplink(s) and many "downlinks". The node is essentially a router connecting all the subnets). When a packet has a destination which points to an address in a different subnet, then the packet is sent to the uplink one step higher in the hierarchy. This continues until a ISP/NSP detects that the address is for one of the subnets of his own hierarchy. Then the packet is sent down in this hierarchy until the final destination is found.

Without going into the details below is a diagram showing the complete IP header of an IP network packet. As you can see there are a lot of fields in addition to the source and destination address. The data field is a place holder for the data your application wants to send and of course it can be of variable length.

Internet Protocol Header

Private subnets

A few subnets from the IP address space are reserved for private use (i.e. everybody can use these subnets but the IP addresses of these subnets are never exposed to the public internet). These subnets are:

10.0.0.0/8
172.16.0.0/12
192.168.0.0/16

Since these addresses are never exposed to the "outside world" there can exist many local subnets with the same address range without that they disturb each other. Why this is useful is easy to see: Imagine your home network: You have a computer and a Portable phone, but also your brother, your sister and your grandmother have computers. Everybody wants to be connected to the internet. However, your ISP only gives you one single IP address since you only paid for one single Internet contract. What now???

You have a wireless router at home which connects directly to your ISP (e.g. via the ADSL phone line or via a fibre coming in your house). It is a router since it connects two different subnets: your home subnet with all devices you have and the subnet of the ISP. This router has a single "uplink" (like the nodes discussed above) and many downlinks to various network devices (often these routers have 4 outputs for cables but they also provide a wireless network to which you can connect many different devices.) All devices in your house create a single subnet which is connected to your router (either with WIFI or via a cable). It is your router which distributes the IP addresses for these devices. The IP addresses it distributes are from one of the above mentioned private subnets (often the 192.168.0.0/16 subnet is used.) This already ensures that all devices on your home network can communicate with each other. (Your wireless router is also a "switch" i.e. it also connects the devices in your home subnet. Hence it has switching AND routing capabilities!!!) The IP address which the ISP gives the router when you power it up is associated to the network interface of your router which connects to the ISP (i.e. your "uplink"). How can a device from the private subnet now communicate with the outside internet?

Imagine you send a request to an outside IP address: your router realises that the request is not for a device of his own subnet. Like the big nodes (=routers) on the Internet, it sends the packet to the uplink (also called the gateway) to your ISP which is on the internet. But before it sends the packet out, it has to substitute the source address (i.e. the address of the device which sends the request) with the IP address which the router has received from the ISP. Hence the ISP then thinks that the requests comes from the IP address it has given to your router. The router internally memorises that it has substituted the source address for this request. When then the answer comes "from the internet" it does the reverse: it substitutes the destination address (which is the address of the router obtained from the ISP) with the original address of the device which sent the request: there are fields in the IP header which allows the router to determine that the incoming answer was an answer to this specific request. Therefore this method also works if the entire family on all computers browse the web at the same time. Once the destination address has been substituted with the address of the device which waits for this network packet, your router hands over the packet to the local network where the "switch" is delivering the packet to the destination.

This procedure is called Network Address Translation (NAT). It is essential for the functioning of our home networks.

DHCP (Dynamic Host Configuration Protocol)

In order to get an IP address there are several possibilities. Normally your computer is part of a bigger university campus, or you have a contract with a telecommunication company for internet, and so the campus IT division or the company "gives you" an IP address. You can request a "static" IP address (which never changes) but you will have to pay for this quite a bit of money (companies do this). Normally your computer is requesting an IP address from your Internet Service Provider (ISP) with a protocol called DHCP. This IP address might change every time you switch on your router (and also sometimes it changes spontaneously after some days/weeks). Essentially this DHCP protocol works like this:

The computer asks for an IP address when it is booting. Since it has no idea of the address of the server distributing the IP numbers it uses a special reserved address for this, called the "broadcast address" (255.255.255.255 the highest possible address). The broadcast address is a special address which all clients on the network listen to.
The server gets the broadcast message and offers the client an IP address. (Other devices on the network also get the broadcast message but they discard the message since they do not run a DHCP service). It sends other useful information like subnet mask, DNS and Gateway address to the client (see below).
The client receives the reply and then checks that nobody else has the address (for safety). This is done with the so called ARP (address resolution protocol) which we do not further discuss. If nobody else has the address then the client tells the server in another message that it accepts the offered IP address.
The server finally accepts the clients request and memorizes the the address it has given to this client.

DHCP addresses are given out only for a finite amount of time (the so called lease time, e.g. 12 hours) after which they need to be refreshed by the client. (This is all done behind the scenes by the networking software of the computer.)

Gateway

As explained before, a home router is connected to many devices on the private subnet and it has one "uplink" to the ISP. Network packets which come from the private network with an address within the private network do not leave the private network. However, only the addresses within the private network are known by the router. If a packet has to go elsewhere the router a priori does not "know" where the destination address is. It then sends the packet through the uplink to the ISP. The uplink is called the Gateway.

DNS (Domain name system)

For human beings it is tough to recall the IP addresses and therefore a human-friendly addressing scheme was invented: the domain names. These are the addresses you type into your browser (e.g. infn.it, google.com, cern.ch, ...). Before your browser request can be sent to the internet the domain name needs to be translated into an IP address. This job is done by the DNS. The DNS servers in the outside world have a protocol among each other so that they are up to date on all registered domain names, and hence they can translate all existing domain names to IP addresses. The IP address of the DNS server itself needs to be known to a computer. Usually it is given to the computer with the DHCP request, however, it can also be configured manually.

On Linux you can use the "dig" command to send out DNS requests. If I type "dig google.ch" I get the answer that the google.ch domain is under the address 142.250.203.99 and all requests to {anything}.google.ch will be routed there. {anything} further specifies where in google the packet should be routed and this is handled by the internal IT network of google.

The TCP protocol

TCP (Transmission Control Protocol) is a very sophisticated protocol on the internet which defines how the transfer of messages over the network should be handled. It also defines how network packages entering a computer are delivered to the correct application (in a single computer multiple applications with connection to the network might run in parallel: your web-browser and an email client for example).

Network packets are delivered to your computer with help of the IP protocol. However, then there must be a way for the computer to decide to which application the network package has to be delivered. This is done by the TCP protocol. The TCP protocol defines a port number (a 16 bit number). A number of applications (or more precisely application protocols) are associated with fixed port numbers. Examples are:

port 80 for HTTP (Web browsers) 
port 443 for HTTPS (also Web browsers)
port 20 and 21 for FTP (File Transfer Protocol)
port 110 for POP3 (email clients)

An application which exchanges data via a specific application protocol knows the port number for this protocol and puts this port number into the corresponding field of the TCP header (there is a field for the source port and the destination port).

In addition TCP fragments big data chunks into smaller packets and sends them one after the other to the network. This avoids that the network is blocked for a long time because somebody has the idea to transfer many Gigabytes of data.

TCP guarantees that

All packets will arrive at the destination (if a packet is not acknowledged by the receiver within a given time, it will be resent until the acknowledge arrives)
All packets arrive in the correct order.

In addition TCP contains sophisticated mechanisms to optimise the throughput. When you send large data blocks it optimises the rate with which packets are sent such that the optimal throughput can be achieved. This is done by monitoring how often a packet gets lost (i.e. is not acknowledged) and then reducing the packet rate to avoid a high packet loss rate requiring a lot of packet re-transmits.

Without going into the details here you see a diagram of a complete TCP header:

TCP header

OSI model

Finally, in the literature you often see diagrams like the following to symbolise the path of a TCP/IP packet. These diagrams illustrate the OSI (Open Systems Interconnection) model, which divides the network communication functions into seven layers. But this abstract model does not always fit well all the different network protocol stacks.

The columns with the various protocols symbolise the processing of the header information in the sending and the receiving computer. The sending computer works from layer 7 down to layer 1 and the receiving computer in the opposite direction. (But not all layers necessarily have their own header.) And here is some confusion: the layer 3 and layer 2 header information is also heavily used by network equipment throughout the routing of the data through the internet. It is not only used in the sending and receiving computer.

The Application Protocol is handled by the application (i.e. the email client or the web browser). The presentation layer handles encryption and compression. The session layer handles multiple data exchanges over an established connection: this is called a session (e.g. NFS (Network File System) or RPC *Remote Procedure Calls"). The transport layer matches for example the TCP protocol and the Network layer would correspond to the IP protocol. Both are processed in the networking stack of the Operating System. The protocol which needs to deal with the underlying networking hardware (e.g. Ethernet or WIFI) is also processed in software by the Operating System drivers and by the networking hardware. This layer is called the Data Link layer in the OSI model.

TCP/IP