The cache is the high-speed data storage memory. It is a temporary storage area that lies between the processor and the main memory(RAM) of a computer for faster data retrieval. It stores the copy of data/information frequently used. The information stored in the cache memory is the result of the previous computation of the main memory. The data stored in CPU cache need not be synchronized with the actual main memory content every time.
The memory hierarchy is an arrangement of various types of memories in the order response time. The following diagram describes the memory hierarchy. As we move up in Memory hierarchy, the execution speed increases but the cost also increases. Example – CPU registers (at level 1) is the fastest processing unit but also very costly hence we can’t use all of them. The secondary memory residing at Level 4 is the cheapest storage unit but their execution speed is very slow, so we also cannot use all of them.
Hence we have to use the combination of these memories in such a way that we achieve good execution speed at a cheaper cost.
Levels of Memory in detail –
- Registers (Level 1) – Registers are the type of data storage memory that is close to CPU. Registers are the fastest storage unit in terms of speed(usually 1 clock cycle). Registers are generally measured by the number of bits they can hold, for example, an “8-bit register”, “32-bit register” or a “64-bit register” etc.
- Cache (Level 2) – It is a faster storage unit (after registers) with faster access time to reduce the average cost of data access from the main memory. It stores copies of data that are frequently accessed by the main memory. Most of the modern CPU’s have multiple levels of CPU caches.
- Main Memory (Level 3) – This is also known as Ram. It’s a volatile memory, hence once the power is gone it’s is empty.
- Secondary Memory – It is the external memory with large capacities (up to TeraBytes). The data is stored permanently and they are slow as compared to all other levels. Ex- HardDrives, FlashDrives, etc.
CPU Cache Performace
Whenever a processor wants to perform a read from or write a memory location if first checks whether the data block is present in the cache or not because read from and write too in the cache is much faster than that of the main memory.
If the processor finds the required data block in the cache, this is known as a cache hit. But if the required data block is not present in the cache and processor has to access the main memory for reading or writing the data block thus increasing the latency, it is known as a cache miss. In case of a cache miss, the CPU will perform the read from or write to from the main memory and also add the entry to cache block too for faster subsequent read and write.
The cache performance is measured in terms of Hit Ratio –
Hit Ratio = (cache hits) / (cache hits + cache misses)
Multi-level CPU caches
Cache memories are fast but very costly. So to make a trade-off between the cost and speed (or latency) we use multilevel caches between the main memory and processor. In multilevel caching architecture, when we move from the higher level to lower level the latency(time to read from or write to) and the cost decreases but the storage capacity increases.
- L1 Cache – L1 cache also known as the primary cache is the fastest cache but smallest in size (generally 1MB-2MB) as compared to all other caches. Whenever the processor starts looking for some instructions, it first searches it in the L1 cache. It is usually embedded in the processor chip.
- L2 Cache – L2 cache is slower than the L1 cache but larger in size (generally 256KB-8MB) also cheaper too as compared to the L1 cache. If an instruction is not present in the L1 cache then processor searches for instruction in the L2 cache.
- L3 cache – L3 cache is the slowest among all caches, but also cheaper and have more capacity(generally 4MB – 50MB). In a multi-core processing environment, each core has a dedicated L1 & L2 cache but a shared L3 cache.
The data flow from Ram occurs first to L3 cache, then the L2 cache, and finally to L1 cache, but when the processor is looking for the data, it first searches in the L1 cache, then L2 cache & then L3 cache. If data is not found in any level of the cache then it is known as a cache hit, but if data is not available in any level of cache and the processor has to fetch it from the main memory, then it is known as the cache miss.
- The Locality Of Reference – The locality of Reference helps in deciding which data block should be placed in the CPU cache. It is generally of two types –
- Temporal Locality – According to the temporal locality if at one point a particular memory location is referenced, then it is likely that the same location will be referenced again in the near future.
- Spatial Locality – According to the spatial locality if a particular memory location is referenced at a time then it is likely that memory location in its close proximity will be referenced in the near future.
- Web Cache – Web Cache (a.k.a HTTP Cache) is temporary storage used for storing frequently accessed static data such as HTML, CSS, images, etc to reduce the latency and the server load.