Memory management
Introduction
Computer programs require memory to hold the program itself and to hold data needed by the program. This includes variables, pointers, data structures necessary to run the program, blocks of data (e.g. sound files, images, contents of documents, measurements, results of calculations...). Some of the required memory needs to be available throughout the entire program life cycle, other parts of the required memory are only needed during specific periods of the programs life cycle. Therefore two memory allocation techniques exist. Often a mixture of these two techniques is used in a program.
Static memory allocation
Statically allocated memory is memory allocated (=reserved) at compile time. This means the amount of memory needed for a specific data block is known before the program is started and the compiler is instructed to generate code that reserves this specific amount of memory for the the RAM memory of the computer. This memory is then reserved throughout the life cycle of the program (i.e. for the time the program is running)
This technique is the "safest" and most predictable way of allocating memory. When the program starts successfully you know that it found all the required memory and it cannot fail afterwards due to not enough memory being available. This is why this method is sometimes the only memory allocation method allowed for programs which have safety or life preserving functionalities where failure of the program could have disastrous consequences.
However, static memory allocations is not the most efficient way of allocating data. If you need a chunk of memory only for a short period in your program it is not efficient to allocate the memory at the start of the program (and more specifically: you would have to allocate the amount of memory which would be needed by your program in the "worst case" scenario) and keep this memory reserved throughout the entire program life cycle. It would be better if it could be "recycled" and used for other purposes. And this is why there is a second way of allocating memory in a program:
Dynamic memory allocations
Dynamically allocated memory is requested and reserved during the running period of the program and it can be given back (freed) to the system once it is not needed any more. Then the memory is available for future dynamic allocations in the same program. For example a subroutine might need some large buffer to read a jpeg file and do some processing on it, but once this is finished the memory can be released and used by some other activity in the program.
Such a technique is more efficient since there are no large reserved but not used memory blocks hanging around during the program execution. However, in principle it might happen that when the program wants to allocate a large block of memory, the system cannot find a large enough memory block and hence the memory allocation and consequently the program will fail.
Fragmentation
Imagine a computer with many programs running at the same time for a long time. Imagine that all these programs dynamically allocate and free memory blocks of different sizes continuously during the time they are running. What will happen to the memory of the computer?
In the beginning (i.e. after booting before the applications are started) there will be a large chunk of free memory which the applications can use to dynamically allocate memory. When an application allocates a block of memory of a given size it takes it from the beginning of this chunk of memory. Then another block is allocated and another and so on... At the same time some memory blocks are freed and re-appear as free memory blocks in the system. The system can use this memory and give it to an application if the requested block is smaller or equal than this freed chunk since it appears in the middle of the already used memory. If you imagine how this will continue (or if you write a little python program to simulate this) you will see that the memory will fragment with time: This means there are a lot of reserved and free memory blocks scattered around in the entire memory space. If now an application requires a large block of memory there is a fair chance that this block cannot be found even though the total amount of free memory is much larger than the required block size. The problem is that the free memory is scattered around and not concentrated in a single block. This is what is called memory fragmentation. The following figures illustrate this concept.



Memory Management Unit (MMU)
Computers have developed strategies to avoid memory fragmentation. A very efficient way to avoid this problem is to use a Memory Management Unit. This is a piece of hardware working very fast and hence do not slow down the software during memory allocations.
The MMU requires the entire memory to be divided into blocks (typically 4kB) which are called pages. The minimal amount of memory which can be allocated is one page. The MMU is maintaining a table with all pages in the system and it keeps track which application has allocated which page and which page is free. In addition the MMU can translate the addresses of the pages (using very fast lookup-tables in hardware). This means that the addresses of the pages in the memory chips of the computer are not the same ones which the software application sees, but they are translated to an arbitrary 4kB range which is defined at the allocation time. To see why this is extremely useful imagine the following scenario. An application wants to allocate 16kB (i.e. 4 pages) to hold a data array. Obviously for this it needs 4 consecutive pages with increasing addresses. The system has many free pages but due to memory fragmentation there are no 4 consecutive pages available. In a system without MMU the program would crash with a memory allocation failure. The MMU, however, finds four free pages which are scattered around in the address space of the memory chips. It then programs the lookup tables for these pages in a way that the memory space occupied by them looks consecutive for the application. Essentially this is done by adding a different offset (positive or negative) to the addresses of each memory page. The four offsets are chosen such that the address used by the software application added to the offset result in the physical address of the page in the memory chips. This adding of the offset is done in hardware and unnoticeable by the software.
The table with the offsets for each page is called Translation Lookaside Buffer (TLB). To not slow down the TLB itself is put into very fast memory close to the MMU. In a system with a lot of memory this table becomes large and it would be expensive to provide enough fast memory to hold the entire table. This is why only a selection of the pages are contained in the TLB and more specifically the pages which are currently being used are in the TLB. The complete page table with all pages is contained in the main computer RAM. When a program accesses a page which is not currently in the TLB (a miss) then the computer reads the necessary information from the page table in the RAM. In this case a small delay is caused. However, at this occasion the page is also loaded into the TLB (may be a less often used page has to be thrown out of the TLB if the TLB memory is already completely used) so that the next access to the page will be quick again. This procedure is a caching algorithm as discussed in the Overview section.
The following figure illustrate this algorithm graphically: The Logical address in the figure corresponds to the virtual address in the discussion above. The address going to the physical memory i.e. the memory chips) is also called the physical address. The lower bits of the address, here labelled D, address the bytes in the single page. 12 bits are needed for this if the page size is 4kB.

This mechanism is obviously great for large systems where you have plenty of memory and where allocating a minimum amount of 4kB is not a problem. In a microcontroller every byte counts and also the additional complexity of having an MMU with very fast special memory is increasing costs. This is why usually in those system software techniques are used to avoid or mitigate problems due to fragmentation.
Memory allocation in Microcontroller systems
Good examples for memory allocation algorithms can be found in the FreeRTOS operating system for microcontrollers. FreeRTOS is providing different memory algorithms to choose from, however, the programmer also has the possibility to write his own algorithm which can be tailored to a specific application and/or to the hardware being used. Two popular examples of allocation algorithms are given below.
Heap_2 algorithm of FreeRTOS
This algorithm allocates statically a large chunk of memory at the start of the program. This memory is used as the so-called "Heap" from which the dynamic memory allocation algorithm gets the memory blocks.
The algorithm uses a "best fit" technique: It maintains a list of free memory blocks and a new allocation will use the block of the list which is best matching the requested memory size, in order to mitigate fragmentation and waste of memory. This algorithm is well suited when the memory blocks allocated and freed by the program are of constant sizes and not of arbitrary size. In the figure below an example illustrates the algorithm. The allocated objects here are called Stack and TCB (Thread Control Block). These two blocks are allocated always in FreeRTOS when a new execution thread is spawned in a program and they always have the same size (the last remark about the size is the essential information for this example. Don't worry if you do not know what a thread is.)

Heap_4 algorithm of FreeRTOS
The so-called Heap_4 algorithm is a variant of the Heap_2 algorithm where neighbouring free blocks are concatenated to form one larger block. This further reduces the fragmentation problem in particular when the sizes of the dynamically allocated blocks vary a lot in a program. The following image illustrate this algorithm:
