System Architecture and PCIe Basics

Introduction

This blog post contains the notes which I took for YouTube video series “System Architecture for BIOS developers” by Sarathy Jeyakumar. Link to the YouTube playlist: https://www.youtube.com/playlist?list=PLBTQvUDSl81dTG_5Uk2mycxZihfeAYTRm

I started watching this series to mainly understand the basics of PCIe. I found this series to be very useful. If you have not seen it, I would highly recommend watching it.

System Memory Address Map

How system memory address is laid out in x86 architecture

As someone who didn’t know what/how memory address map worked, I found it difficult to understand/visualize it in my head. I was under the impression that the diagram which is shown below is a 1 to 1 mapping of my RAM memory to address. But, that’s not the case. Memory map is kind of a virtual address map which the CPU uses to map different components to its address space(It will be discussed later how/why we need this in below sections). So, RAM is one of the components whose memory is mapped to the system memory address map. I thought to add this so that it will be a little clear to someone who is learning about system memory map for the first time.

  1. There are two different types of address spaces. One is memory address space and IO address space.
  2. Memory address space usually maps to a physical storage device, and it is accessed via CPU instructions like MOV.
  3. IO address space usually maps to IO devices, and it is accessed by instructions like IN/OUT.
memory-map

Memory Space

  1. Memory space depends on how many address bits we have.
  2. If the processor can support ‘x’ address bits, then the address space can start from 0 to 2^x
  3. Modern processors can go up to 40 address bits?
  4. Original IBM PC/PCXT which were based on 8088 processors, had only 20 address bits. 2^20 = 1048576 bytes = 1 MB address.
  5. This 1 MB is the bottom most memory area in the table, and it is still maintained for backward compatibility. This 1 MB space has System BIOS(first 128kb), Interrupt Vector Table(IVT), BIOS Data Area(BDA), SMM Area, Expansion ROM Area.
  6. The next generation of processors were able to support up to 32 address bits. 2^32 = 4 GB
  7. Within the 4 GB space, some space is carved out to Memory Mapped Input Output(MMIO). There are two regions. One is MMIO-Low which is below 4 GB and another one is MMIO-High which is at top.
  8. MMIO address range is not mapped to actual physical memory(DRAM).
  9. Memory regions which are in green are mapped to physical memory and any access to this memory range goes to DRAM
  10. To summarize, we have holes in the memory space(for MMIO). Actual physical memory map starts from 0 to TOLM(Top of Low Memory) and from 4 GB to TOHM(Top of High Memory). 

Segments/Legacy/1 MB Region

segments

Section 0xF and 0xE

Segmentation

  1. 20-bit memory address is split into segment and offset. Each segment is 64 KB in size.
  2. To cover 1 MB space, 16 segments(0x0 to 0xF) are required.(16*64 KB =1024)
  3. Segment:offset is total 16bits. This is converted into a 20bit address by doing the following operation. Shift left(<<) segment by 4 and add offset to it. Example: If Segment is 0xF000 and offset is 8000, then the physical address is (0xF000 < <4)+(0x8000) = 0xF8000.
  4. Segment 0xF and 0xE contains BIOS. Access to segment 0xF and 0xE can either go to BIOS Flash chip or to main memory(DRAM). Usually, BIOS resides in a separate Flash chip on the mother board.
  5. BIOS Flash is part of IO subsystem. Whenever we power up the system/reset the system, the execution starts at 0xF000:0xFFF0, which translates to physical address 0xFFFF0. This address is also called as reset vector.

BIOS Shadowing

  1. When we first power-up the platform, segments 0xE and 0xF points to the BIOS Flash, so that BIOS code can start executing. Later on, during BIOS execution, after memory initialization and once memory is available, BIOS copies some portions of itself to main memory(DRAM) and switches the map, so that future access to these segments always goes to main memory(DRAM). This process is called as BIOS Shadowing. Now, 0xF and 0xE segments are called as BIOS Shadows.
  2. Reason why we do BIOS shadowing is, we can’t access it every time from Flash chip, which is slower. Also, BIOS is compressed, and we need to extract it and run somewhere in main memory.

Section 0xD and 0xC

These segments are called as option ROM or Expansion ROM. This will be covered in detail later

Section 0xA and 0xB

These segments are mapped either to SMRAM or to VGA buffer

Section 0x0 to 0x9

This is total 640 KB region(64 KB*10)

Segment 0x0

  1. The first 1 KB(0x0 -0x3FF) within segment 0 contains Interrupt Vector Table(IVT). IVT table is like an array of pointers, in which each pointer points to an Interrupt Service Routine(ISR). This array is index-d by interrupt vector number. In x86 architecture, each interrupt will have an 8 bit vector number associated with it. So, we can have interrupts vector numbers from 0 to 255(8bits).
  2. There are two classes of interrupt. One is Software interrupt and another is Hardware interrupt. Both, share vector numbers in the range from 0 to 255.
  3. In case of SW interrupt, the vector number is provided as part of the instruction. In case of HW interrupt, the interrupt controller provides the vector number.
  4. From address 0x400(after first 1 KB) we have BIOS Data Area(BDA)  which BIOS builds during boot up. To know more about what data is present in the BDA area, check this link: https://stanislavs.org/helppc/bios_data_area.html
  5. Extended BIOS Data Area(EBDA) size can range from 0 to 255 KB. EBDA is stored from top of 640 KB to below area. Depending upon the size of EBDA(in multiples of 1 KB), the remaining size is the memory size in the segment 0 to 9.

BIOS Boot Flow

BIOS Boot Flow from HW perspective

  1. In the previous section, we saw that BIOS starts at reset vector(0xFFFF0). This is for 16 bit mode for older processors. But, today’s processors are mostly 32 and 64-bit processors. Entry point/reset vector on these modern processors starts at 4 GB. So, Entry point would be at 0xFFFFFFF0. So, when power is applied, and the processor starts fetching code. It starts fetching from 0xFFFFFFF0 in increasing order until it encounters JMP/Branch instruction. This means we need to jump before hitting address 0xFFFFFFFF. Where we jump depends on the BUS implementation. In legacy BIOS, we do a far jump from 4 GB to legacy area reset vector space.
  2. In today’s implementation on modern processors, BIOS switches to 32-bit mode and continues its execution from 4 GB area.

What does BIOS do after reset/power up

  1. Switch to 32 bit mode, since we are executing at 4 GB area which requires 32-bit.
  2. Then it locates and loads the uCode patch. What it means is that BIOS will carry some microcode patches, which it will load into the processor during boot if required.
  3. Then it will set up cache as RAM(CAR). Cache here is referred to the cache available in the processor. Until this point we are executing from BIOS Flash region, and we will not be able to access any memory data structures, we can’t use stack, we can’t use variables. So, it will be stackless code until this point. Due to this, we use cache in CPU and make it look like stack memory until we get the actual memory. Once we have CAR, we can switch to C code.
  4. Identify platform information and/or user inputs from BIOS setup options and pass this to Silicon init code.
  5. Silicon init module is vendor propriety code, which typically does memory initialization. And, in multiprocessors systems, it takes care of setting up of interconnect between the processors. Once step 5 is complete, the basic silicon functionality has been initialized and the memory is now available.
  6. Now, since memory is available, we switch out from CAR and start using main memory.
  7. Then, we wake up all the CPU cores in the system and initialize them.
  8. Setup System Management Mode(setup SMRAM)
  9. Then, BIOS does PCIe bus enumeration and resource allocation. During the enumeration, it walks through the hierarchy to find bridges/switches, end point devices and calculate MMIO memory requirements or any other resource requirements and allocates them.
  10. Launch option ROMs if found during PCIe enumeration.
  11. Setup power management.
  12. Setup OS interface tables such as ACPI(mainly used to expose platform capabilities to OS), Memory map(to know usable and reserved memory), SMBIOS structures(exposes information about the HW), etc.
  13. Invoke BIOS setup if it is requested by the user. Usually done by pressing F2 key during boot up.
  14. Launch OS

Transition Flows and Address Decoding

  • Let’s say the processor is making some transaction to the memory like MOV instruction. Now, how is the MOV(read/write) instruction to a memory address is handled by processor/chip set HW? There should be some decoding/routing to guide the transaction from the processor to the right destination.
transition flow 1
  • Consider the simple example above, where we have only one CPU core. In this example, we are executing two instructions. One does loading of value 8000h to EBX, and then we are trying to load the value 1234h into the address(8000h) pointed by EBX.
  • The task of the processor is just to execute the instructions. It doesn’t know what type of address the instructions are trying to access. So, there should be some kind of decoding logic here to translate to which memory address space the transaction should be sent. Whether it should go to normal Memory address space or to MMIO address range.
transition flow 2
  • As seen in the above image, In modern processors, we have both CPU core and Chipset/System Agent/ Memory Controller Hub enclosed inside a single chip/package. Functionally, CPU and chipset are different components. But, in modern processors, they are enclosed inside a single package. Chipset/SA/MCH takes care of decoding the memory address. It also does other operations, which will be covered in upcoming sections.
transition flow 3
  • Let’s take a modern processor with multiple cores. All the cores are connected to System Agent(SA). SA has multiple components like Memory Controllers(MC) connected to memory, PCIe root ports. In intel chips, we typically have another South Bridge component called as Platform Controller Hub(PCH)/IO Controller HUB(ICH). PCH/ICH has some IO devices connected. It also has path to BIOS Flash.
  • To summarize, CPU cores are functionally separate from SA. CPU core’s main task is to execute instructions and If there is a request which needs to happen to the rest of the system (like reading IO device or memory device), then the CPU will send the transaction to SA. SA will look at the transaction, and it will identify if it is a memory bound or IO bound transaction and route the transaction accordingly.
transition flow 4
  • Transaction when it comes out of CPU, it indicates to the SA whether it is a memory or IO transaction. If an instruction is like MOV, it maps to memory transaction. If it is like IN/OUT, then that transaction is an IO bound transaction. SA will decide on sending the instruction to the appropriate memory/IO address.

Touching few more basics with 8086 CPU

8086 cpu transaction flow
  1. In 8086 CPU, let’s focus on 3 important signals Read(RD), Write(WR) and IO/MEM. These 3 signals, indicate to the rest of the system what it should do with the transaction coming out of the CPU. We have a 20 bit(1 MB) address bus and 16 bit(64 KB) data bus. When CPU executes an instruction, it is going to put address out and data out. There are some more signals to co-ordinate this properly, which is not discussed here.
  2. When the CPU executes an IO instruction, IO signal will be set to 1. If CPU executes other instruction like MOV, then this signal will be set to 0.
  3. The CPU is going to say whether you are going to do read/write based on RD/WR signals, and then it is going to say what instruction produced this transaction. Now, it is up-to the rest of the system to guide the transaction, decode it and send it to appropriate devices or agents.

Transaction Flows and address decoding – Continuation

  1. The above diagram is a representation of a typical single socket system with little more details. We have a socket which contains 8 CPU cores. All these cores talks to System Agent(SA). SA agent takes care of decoding the transaction and decide where to send the transaction. In the above diagram we have two Memory Controllers(MC) each connected to two channels. We have 2 DDR DIMM(dual in-line memory module)/RAM stick connected to each channel. In total, we have 8 DIMMs connected. Then we have IO subsystem which has a root complex which spawns multiple PCIe Root Ports. IO subsystem is also connected to PCH(Platform Controller Hub). PCH has many IO interfaces such as SATA, LAN, USB, SPI(Serial Peripheral Interconnect), LPC, etc.
  2. More details on IO subsystem: Note that MMIO range can be split across multiple root ports. When the BIOS boots, it programs the SA and specifies SA about the MMIO low and MMIO high address ranges, so that they can be sent to the IO subsystem. In IO subsystem, we have a bunch of root ports. PCIe root ports spawns bus numbers and there can be PCIe switches/end point devices connected to the RP. When we get a transaction with MMIO address range, then it should be claimed by one of the root ports. Each root port will have registers inside the root port. It is called Base Address Register(BAR). This register is programmed by the BIOS. Once the BIOS programs the BAR ranges, then the incoming transactions hit the root ports and each root port will check if the transaction is targeting the range of address which is programmed in its BAR register. If it belongs in that address range, then that transaction is claimed by that root port. If the transaction is not claimed by any of the root ports, then it falls down to the default path. In intel terminology it is called DMI(Desktop Management Interface) port, when send the transaction to Platform Controller Hub(PCH). PCH is also known as south bridge.
  3. Transaction which we get to IO sub-system, and if any of the root ports claims that transaction, then it is called as positive(+ve) decoding. If a transaction is not claimed by any of the root ports, then it goes to the default path(PCH). This is called subtractive(-ve) decoding.

Found this image on Wikipedia which shows how a typical motherboard looks like:

More details on BAR

Typical endpoint config space

end point config space
  1. In the above diagram, we can see there are multiple BAR registers. These BAR registers are a set of range registers that are programmed by the BIOS, saying this is the address range that belongs to you, and you can claim the transaction which are in this address range.
  2. Let’s say you have a Video card which has 2 MB buffer in it. Now, let’s look at how this memory gets mapped to overall system address space, so that the processors can go and access this 2 MB memory. This is where the BAR registers come into the picture. The bus allocates 2 MB address space from MMIO memory space and gives the address details to the end-point device. So, now whenever a transaction goes to that particular address range, then it will percolate down to the end point device, and it will map to the 2 MB buffer we had in the Video card. This is how end-point resources can map their own resources like memory into the system address space, so that the processors can go and access that resources.

PCI and PCIe Basics

  1. PCI is not used extensively anymore and is considered as legacy technology. It is now replaced with PCI Express. But, the SW mechanism to access the PCIe devices are very similar to PCI devices.
  2. A PCIe device is something which is attached to a bus. Device can be of two types. It can either be an end point device or it can be a bridge device. An end point device is anything which does not spawn a bus behind it(like graphics card, PCIe SSD, etc.). Each device can have multiple functions within it. There are can be up-to: 0-255 buses(8 bit), 0-31 devices(5 bit), 0-7 functions(3 bit)
  3. On each PCIe device, with the function, we have registers. A typical PCIe device will expose some registers, which will be used by the SW(SW here means BIOS) to configure the device. Those registers are called as config registers. In PCI, we can only have registers up to 256 bytes. But, In PCIe, we can have up to 4 KB size.
  4. As we have already seen in the previous sections, a PCI/PCIe device can be either an end point device(Type 0) or it can be a bridge device(Type 1). There is a filed called header type in the config space register. Software(BIOS) needs to check the config register space to find whether it is Type 0 or Type 1 device.

PCIe Hierarchy/ PCIe config cycle

In PCI, Device numbers were statically assigned. Each PCI device will be assigned a device ID(IDSEL). IDSEL is like a chip select here which tells which device to select. There are some bits in address line which indicates to which device ID it is connected to. This is only for PCI. Now, in PCIe it is completely different. In PCIe, it is assigned at the silicon level and not at the platform level.  In PCIe devices, device ID are assigned at the hardware level.

In order to assign the bus and device numbers for PCIe Hierarchy, BIOS takes some info from the config register(Primary, Secondary, Subordinate bus).

  1. If the device is type 1 device(bridge), then its config space has 3 fields namely Primary, Secondary and Subordinate bus numbers.
    • Primary – Primary bus number is the bus number on which the device sits
    • Secondary – New bus number which is assigned/created to that device
    • Subordinate – It specifies how deep the tree is behind the bus. It says what is the last bus in that hierarchy/tree.
  2. Based on the values that are programmed to these three field by the BIOS, the transaction will either terminate at that root port or it will fall down to the next bus behind the root port/bridge device. Conceptually, root port is basically a bridge device which sits on root bus(bus 0)
pcie-hierarcy without numbers

The above image contains few PCIe devices connected to the root complex. Now, let’s see in detail how these devices are assigned with bus, device numbers as part of PCIe hierarchy process by the BIOS.

PCIe hierarchy works on depth first search approach. Below, table shows the bus/device/depth number in each step and image shows device/bus numbers which assigned at the end. Let’s see show the bus/device number is assigned by the BIOS to each of these devices in step by step.

Config cycleBus DevicePrimary/Secondary/Subordinate
1Bus 0, Device 00/1/FF
2Bus 1, Device 0 
3Bus 1, Device 1 
4Bus 1, Device 2 
5Bus 0, Device 00/1/1
6Bus 0, Device 10/2/FF
7Bus 2, Device 0 
8Bus 2, Device 12/3/FF
9Bus 3, Device 0 
10Bus 3, Device 1 
11Bus 3, Device 2 
12Bus 2, Device 12/3/3
13Bus 0, Device 10/2/3
14Bus 0, Device 2 
15Bus 0, Device 3 
pcie hierarchy after bus number allocation
  1. Since it is depth first search, we start from bus 0 and traverse through all the devices from left. In this example, It will first target Bus 0, Device 0 and read its config space. It will typically read the device ID and vendor ID first, and then type of device. Here, device 1 is a bridge device. Whenever it sees bring device, it tries to read config space to get the required info.
  2. When a new bridge device is found, a new bus number is allocated to it first, so that we can continue to traverse further and find the devices which are connected below it. Before moving further, it will program 3 config space attributes(primary, secondary and subordinate bus info). Here, primary is 0(since this bridge device sits on bus 0), secondary is 1(It’s the newly created bus number which is allocated to it), subordinate as 0xFF(-1). First, we assign subordinate as -1, since at this point we don’t know how deep the hierarchy goes down from this bus. After this configuration, any transaction which comes from CPU for BUS1, will be claimed by this bus and sent down to the correct device.
  3. Traversal continues, and we traverse through the devices which are present in the bus0 from the left side. Here, we have 3 devices connected to bus 1. Device IDs(From 0) are given to these devices and no more traversal is required in this bus since all three devices are end point devices.
  4. Once these 3 devices are configured, now the traversal goes back to the bridge device 1 on bus 1 and now its subordinate config is updated. We update it at this point, since we know how deep the bus goes from here. Here, it goes till max depth of bus 1.
  5. Now, since we have visited all the devices which are connected to bus 0, device 0, we continue our traversal to the next device which is attached to bus 0. In our example, device 1 is a bridge device which has more devices connected to it. We do the same thing, as we did in previous steps. Since it is a bridge device, BIOS will try to assign primary, secondary and subordinate buses to this device. Here, for the device 1, primary will be 0(since sits on bus 0), secondary will be 2(newly assigned bus number for this bridge device) and subordinate will be 0xFF(-1 is assigned since we don’t know the depth yet). Once this is configured, we start to read the next devices which are connected to this Bus 1. In the example above, we have 2 devices connected to it. So, we continue our traversal to these devices.
  6. Device 0 from left is an end point device. Since it is an end point device, we don’t need to configure anything else apart from assigning the device number. The next device is a bridge device. We repeat the same steps as above for this bridge device. We configure primary/secondary/subordinate for this device. Here it is 2(bus on which it sits)/3(newly created bus number)/FF(-1 we don’t know the depth yet) and then we start to look into the devices which is connected to this new bus.
  7. In our example, bus 3 is connected to 3 end point devices. So, no buses are behind it, and it will go back to its parent bus to complete the traversal.
  8. Now, traversal comes back to bus 2, device 1. Here we update the correct subordinate value(3 – since it is the maximum bus number we saw in below traversal).
  9. Now, traversal comes back to bus 0, device 1. Here we update the correct subordinate value(3 – since it is the maximum bus number we saw in below traversal).
  10. Now, traversal is complete for device 0 and 1 in bus 0. We, continue the traversal on the remaining devices in bus 0. Here, we have 2 devices(2,3), which are end point devices. Since, they are end point devices, no more traversal is required and all the devices are configured with device and bus numbers. PCIe hierarchy is completed now and all the devices are assigned with bus and device numbers.

PCIe Config

From previous sections, we know that we can have 256 Buses and each bus can have up to 32 devices and each device can have up to 8 functions and each function can have either 256 bytes(for legacy PCI device) or 4 KB(for PCIe devices) config registers. In this section, let’s see how we can access those config registers. We have two different ways to access these registers.

  1. IO mode – Used in legacy PCI devices. PCIe devices also support this.
  2. MMIO mode – Only supported by PCIe devices.

For legacy PCI devices

First, let’s see how we can access on legacy PCI devices. This mode will also work in new PCIe devices.

  1. We need to make two writes. One to an IO port 0xCF8(called as index register) and another write to 0xCFC(called as data register).
  2. For reading a config register, first we write the config register address(say bus 3: dev 2: fun 5: reg 40) which we need to access in to the index register. Then we do a read to data register to get the contents of the requested config register.
  3. For writing data to config register, we do the same operation. First, we write the config register address to index register and then do a write to data register with our data.
  4. Let’s see how config register address is calculated with below example:

We already know we can have up to 256 buses(8 bits) and 32 devices(3 bits) can be connected to each bus and each device can have up to 8 functions(3 bits) and each function can have either 256 bytes(8 bits) or 4 KB(12 bits) registers.

Let’s say we need to calculate the address for bus 3, device 2, function 5 and register 40

io mode address calculation

In the above example, we can see how a 4 byte address is formed. Byte 0 contains the register which we need to access(it is restricted to 8 bits so that the maximum value it can access is 255), Byte 1 contains device number and function number combined into a single byte and Byte 3 contains the bus number. In Byte 4, MSB is set to 1 and others bits are set to 0.
 
Now, if we want to read bus 3, device 2, function 5, register 40 then first we write this address(0x80301540) to the index register and then read back from data register to get the contents of this register.
 
Few more points:
The address which is formed should by the above method should be DWORD aligned. So, let’s say you want to read register 41 instead of 40. In this case, you still need to write register 40 address in the index register(to make sure it is DWORD aligned). But, now instead of reading it from 0xCFC(data register), we read from the next byte(0xCFC+1 = 0xCFD).

Also, in this mode we use IN/OUT instructions, which takes more CPU cycle and time-consuming when compared to MMIO mode.

For New PCIe devices

PCIe devices use new method called MMCFG(Memory Mapped Config) to access config space

config memory in system memory

Let’s see how we can access the required config register for a device in this 256 MB region.

pcie config memory

Let’s take the same example as before and see how we calculate the register address here.

Bus 3, Device 2, Function 5, Register 40. It is similar as previous example with little changes. Here we have 12 bits(since 4K) for register. So, first 12 bits are for register and next byte is for device and function and next byte is for bus.

3:2:5:40 => 0x03:0x15:0x040 => 0x0315040

Now, we need to add this(0x3015040) to MMCFG_BASE address, to create the address. Let’s say for example, MMCFG_BASE starts at 2 GB, then 2 GB to 2GB 250 MB is MMCGF space.

2 GB is 0x80000000 + 0x0315040 = 0x80315040. So, instead of using IN and OUT instructions(like how it is used in legacy mode), we can directly use MOV instruction on this address. We can treat this address just like a normal pointer.

But, how do we know what is the MMCFG_BASE address set by the BIOS? There is a table called MCGF table in ACPI using which BIOS finds the base address.

Sample Program on DOS

Sample Program for reading or writing to config space in legacy PCI – https://youtu.be/NgzT1JfBUr0

In this video, it is explained how to do sample read/write to config registers in DOS box.

How to Find MCFG table and MMCFG space

How to Find MCFG table and MMCFG space: https://youtu.be/dWrAaawmgvQ

Video demonstrates how to find MMCGF_BASE using Read&Write Utility program.

Few points from the video:

  1. MMCGF_BASE is exposed to OS by a ACPI table called MCFG table. But, how to find MCFG table?
  2. There is a signature called RSD PTR. Below that, you are going to have a pointer called RSDT(Root System Descriptor Table). RSDT is also a table of pointers, each pointing to different ACPI tables. Some pointer in the table me HPET(High Performance Event Timer), SRAT(System Resource Affinity Table), MCFG Table. Our goal is to find the MCGF table from this pointer table list. 
  3. To find RSD PTR, If you are running in legacy 16bit mode, then you need to search for the string “RSD PTR” in either Segment E/F or in EBDA(Extended BIOS Data Area).

If it is UEFI boot, then EFI system table will be passed as parameter to OS loader. This table will have the information on where the RSDT is located.

PCIe MMIO Resource Assignment

In this section, we will see how the resources which are required for MMIO are assigned.

Transactions targeting the IO subsystems can be to two types. It can either be Config transaction or MMIO transaction. Config transactions target the config space in the PCIe device. BIOS uses this config space to discover the topology and assign bus numbers, as seen in previous sections.

Just recap of what MMIO means: Even though the transaction from CPU contains the system address, the transaction is actually mapped to the IO subsystem.

mmio example

Consider the above example, where we are trying to load a value from a memory address. In the above example, we are trying to load the address 0x12345678 in to EBX register and then trying to load the data pointed by that address into EAX register. Here, when the processor executes this instruction, it doesn’t know or doesn’t care whether this memory address is pointing to memory or IO subsystem. When this transaction goes out of the system core to the System Agent(SA), the SA will determine whether the transaction should be sent to IO subsystem or to memory sub system.

Why would PCIe device need MMIO range? Most likely it has its own internal memory and that memory has to be mapped to the system memory.

This MMIO memory mapping is taken care by the BIOS. Depending on how much memory is requested by the device, it is either mapped to MMIO high or MMIO low address range.

type 0 bar

Let’s see what is the format of Base Address Register and how it is mapped to required system memory. The format of BAR register is shown below in the below diagram. Bit 0 says whether it is memory/legacy IO space. Next 2 bits is going to specify, whether it is a 32-bit decoder or 64-bit decoder. If it is a 32-bit decoder, then MMIO Low address range is assigned and if it is a 64-bit decoder, then MMIO high address range is assigned. There are only 32-bits in the Base Address Register, but when the device claims it can decode a 64-bit address, then the next BAR register is combined with it to make a 64-bit BAR register.

bar format

How does the BIOS discover, what is the size of requested MMIO range is? The way it is discovered is by writing all 1’s to BAR register, and then by reading back that register. When you read back the register, the trailing 0’s that you get will indicate what is the size of MMIO range which is requested.

For example, let’s assume a PCIe device that needs 1 MB of MMIO address space. First, for the BIOS to find how much MMIO memory is required for the device, BIOS writes 0xFFFFFFFF to BAR register. Then, when it reads back the same BAR register, it will get 0xFFF00000(5 zeros, 20 bits, 2^20 = 1 MB) indicating 1 MB of memory space. So, by finding out where the first non-zero value starts, BIOS calculates the memory requirement. Also, MMIO needs to be naturally aligned. What it means is that, the lower bits on BAR register, which are returning zeros, those bits cannot be overwritten/changed when assigning back an address.

The next step here would be to see how, the Root Port(to which the device is connected) knows what is the MMIO range assigned to its connected device. For this, the BIOS has to go and program the root port’s config register. Root port has type 1 header which has fields pre-fetchable memory base and pre-fetchable memory limit. BIOS is going to program these base and limit registers with starting of MMIO and ending of MMIO address. Now, whenever a transaction comes to the root port, it checks if the address is within this base and limit range(this range is usually called as aperture) and RP claims the transaction if it is in the range.

Now, what if there are multiple devices connected to a single root port, then how will the base and limit registers values will be set? In that case BIOS has to go and assign the address to all these devices which are connected to that RP as consecutive as possible, with alignment constrains, and then program the entire address range of all the connected devices. So, the base and limit registers should encompass the entire MMIO range that is claimed by all the devices behind a root port.

Demo of BAR and aperture

Demo of BAR and aperture using Read&Write utility – https://youtu.be/diQUJ1kFoS4

Leave a Comment