/Xv6 Rust 0x02/ - printf!("Hello xv6-rust!")

With the foundation established in the previous article, we now have a working Rust environment on RISC-V.
In this second installment, we'll examine the actual xv6 code,
handling the initialization from machine level to supervisor level, and
ultimately implementing the printf!()
macro.
1. Short but important assembly code
In our current implementation, we designated main()
as
the entry point, performing just one crucial operation before executing
test code: initializing the stack pointer.
However, xv6 performs additional initialization steps by placing the earliest boot code in a dedicated assembly file named "entry.S" (some code/comment truncation may occur; see the linked file for complete content):
1 | ### entry.S |
The above code basically set up a 4096-bytes stack, for every hart,
and the start address of stack, which we have set to
0x80001000
in our previous code, comes from a constant
stack0
that located in the "start.rs".
1 | // start.rs |
The stack0
is defined as a u8
array sized
4096*NCPU
, ensuring sufficient kernel stack space for each
hardware thread. During compilation, this array is placed in the
.rodata
section with its address mapped within the
accessible memory range.
We can verify this memory layout by examining the kernel
binary (compiled output of xv6-rust):
1 | readelf -s kernel | grep stack0 |
The above content shows the stack0
has address
0x80015800
, with Ndx = 2
means the
stack0
is located at .rodata
section.
Basically the entry.S
only responsible for initialized
the stack pointer, and then jump to rust code directly.
Finally, please don't forget to update the program entry in the
entry.ld
, as well as the text section:
1 | ... ... |
And declare the entry.S
in main.rs
:
1 | // main.rs |
2. Machine -> Supervisor
No doubt that the last line of ASM call start
will bring
us to the start()
, and here is the core part of the start()
:
1 | // start.rs |
Technically, this barely qualifies as Rust code since most functions
(like r_mstatus()
or w_mepc()
) are thin
wrappers around inline assembly instructions, as seen in the
repository's implementation.
These functions primarily interact with RISC-V CSRs (control and
status registers). While the complete Privileged
Specification (166 pages) covers CSRs in detail, the following table
summarizes their roles in the start()
function:
CSRs (Control and Status Registers) are privileged-mode registers
(machine/supervisor) that:
- Store system state information
-
Control hardware configurations
- Are accessed via specialized CSR
instructions
Register | Name | Description |
---|---|---|
mstatus | Machine Status Register | The mstatus register keeps track of and controls the hart’s current
operating state. Here we only care about the MPP filed, which stores the previous privileged mode: M = 11; S = 01; U = 00; Back to the code, it sets the previous privileged mode from machine to supervisor. (Note that here the MPP filed only store the mode value, the privileged mode won't be switched immediately) |
mepc | Machine Exception Program Counter | When a trap is taken into M-mode, mepc is written with the virtual
address of the instruction that was interrupted or that encountered the
exception, and it may be explicitly written by software. So why here in the code, the function address of kmain been
written? It's highly connected with the mret instruction,
we will get back to this afterward. |
stap | Supervisor Address Translation and Protection | It controls supervisor-mode address translation and protection. And here we just set it to 0 for disable the virtual address translation. |
medeleg / mideleg | Machine Trap Delegation Registers | By default, all traps at any privilege level are handled in machine mode. In our code, both these registers are set as 0xffff to indicate that all traps will be delegated to handle on S-mode |
sie | Supervisor Interrupt Registers | In the code, the External / Timer / Software interrupts are all enabled on S-mode. |
pmpaddr0 / pmpcfg0 | Physical Memory Protection | These two register combined controlling the access permission across
a specific address range. Here, xv6 allows RWX permission on S-mode, across the range of 0~0x3ffffffffffff , that range
covers almost 1PiB address space. |
tp | Thread Pointer | tp is one of the general purpose registers, not part of
CSR. Obviously the name thread pointer indicates this register is a
thread local store register.Then it's easy for us to understand the code: store hart id into tp for quicker access. |
The instruction mret
is highly related to the
mepc
register, like we described in the above table.
mret
is called "Trap-Return Instructions", which is to
return from the trap.
Generally speaking, when any trap like interrupt or exception
happens, the instruction address where the trigger the trap, will be
stored in the xPC
register(like mepc
or
sepc
), then the program will be redirect to a trap handler
that related to the specific trap. Once the handler done its work, and
the program needs to return to the original location, it will need to
fetch the address from xPC
, and set program counter with
that, then jump back to the address.
mret
(and not surprisingly, there is a sret
too) does the whole process by only one instruction, besides, it will
also trigger the privileged mode switch, to the mode saved in the MPP
filed of mstatus
.
So I suppose you have understood the code logic here: at first set
the kmain
to mepc
, then do some work, at last
call mret
so that the program will jump to the
kmain
, while the privileged mode is switched to S-mode as
well.
How does risc-v deal with the privileged mode switch?
.... RISC-V Privileged Specification Chapter 1.2 ...
A hart normally runs application code in U-mode until some trap (e.g., a supervisor call or a timer interrupt) forces a switch to a trap handler, which usually runs in a more privileged mode. The hart will then execute the trap handler, which will eventually resume execution at or after the original trapped instruction in U-mode. Traps that increase privilege level are termed vertical traps, while traps that remain at the same privilege level are termed horizontal traps. The RISC-V privileged architecture provides flexible routing of traps to different privilege layers.
.... RISC-V Privileged Specification Chapter 1.2 ...
Generally, when a trap happens, the address of where the cause the trap will be saved in
mepc
orsepc
, regarding the current privileged mode. After trap handled by specific handler, it should call eithermret
orsret
to return to the previous mode, which is stored in theMPP
orSPP
filed of themstatus
.
3. We need UART
With the mret
is executed, the program is running into a
new file: main.rs
,
which is hard to tell if it's new, because we already have one, one not
exactly since we will introduce a new function kmain
to
replace our previous main
.
Don't be frightened by a lot of new functions that are called within
kmain
, we are not gonna need them currently, the only
functions we should pay our attention to are the
Uart::init()
and Console::init()
:
1 | // main.rs |
QEMU generic virtual platform for risc-v supports a "NS16550 compatible UART". According to the memory address mapping we talked about in the last chapter:
1 | qemu-system-riscv64 -monitor stdio |
UART address starts from 0x1000000
. And there are about
10 registers to config and control the UART (for more details refer to
the 16550
specification).
Let's go back to code. We can find all UART related code in the file
uart.rs
.
And basically Uart::init()
initializes the UART in the mode
of 8 bits + 38.4k baud rate + FIFO with interrupt.
In fact, after initialize, we could directly put or get chars by the following code:
1 | // uart.rs |
Let's have a quick test to print a "A" to the console:
1 | // main.rs |
1 | ... ... |
Awesome, we have printed the first letter! Since we can print a
letter, the printf!()
is around the corner.
4. printf!()
At last, we got here. So far we already output a letter "A" through UART, the next we simply need to create a printer and call UART inside to print.
Generally speaking, the only difference between UART with a printer is that the printer takes a format string rather than a character, which means the printer is on a higher abstraction level, and needs to conduct the preprocess of format string, to parse the format string to a standard string, and then crack down the string to characters.
Refer to the print.rs
,
the macro printf!()
receives the input arguments as the
"format_args":
1 | // print.rs |
"format_args" allow us to print a string with params, such as
printf!("This is a {}", "param")
.
The best part here is we don't need to do anything by ourselves to
parse the relatively complex arguments: "This is a {}"
and
"param"
. There is a rust trait
core::fmt::Write
takes care of all that stuff!
Let's go to the console.rs
:
1 | // console.rs |
The Write
trait implemented the function
write_fmt
by default, we only need to implement the
write_str
here and output the string that has already been
parsed correctly. The string can be outputted by calling the UART
function putc_sync
.
Finally, we could print something with printf!()
!
1 | // main.rs |
And the output:
1 | ... ... |
It works!