Stack Memory in ARM Cortex-M4 : Stack memory is an essential part of how microcontrollers like the ARM Cortex-M4 manage temporary data, especially during function calls and interrupt handling. If you’re just starting out in embedded systems or working with ARM processors, understanding stack memory is a must.
What is Stack Memory?
Stack memory is a special section of the main memory (RAM) that is used to temporarily store data. This data is often short-lived and needed only during certain parts of a program, such as:
- Function execution
- Interrupt or exception handling
- Saving and restoring CPU register values
Where is stack memory stored?
- In internal RAM or external RAM of the microcontroller.
- Allocated at the time the program starts and managed automatically.
How Does the Stack Work?
Stack follows a Last-In, First-Out (LIFO) rule, which means:
- The last data pushed (added) onto the stack will be the first to be popped (removed).
Think of it like a stack of plates: you add one plate on top, and when you need one, you take the top one off first.
Stack Operations: PUSH and POP
The processor uses two main instructions to work with the stack:
Instruction | Action |
---|---|
PUSH | Saves (stores) data on the stack |
POP | Restores (retrieves) data from the stack |
These instructions automatically modify the Stack Pointer (SP).
What is the Stack Pointer (SP)?
- The Stack Pointer (also called SP or R13) is a special CPU register that always points to the top of the stack.
- When you PUSH, the stack pointer decreases (stack grows downwards).
- When you POP, the stack pointer increases.
Stack Grows Down
In Cortex-M4, the stack grows from high memory to low memory.
When Is Stack Memory Used?
Let’s go through some common use cases:
1. 🧠 Temporary Storage of Register Values
When a function is called, or an interrupt occurs, the processor temporarily stores important register values (like R0–R3, LR, PC, xPSR) on the stack to preserve the program state.
2. 📦 Temporary Storage of Local Variables
Local variables declared inside a function (e.g., int x = 5;
) are stored in stack memory. These variables are automatically cleared when the function ends.
3. 🚨 Exception or Interrupt Context Saving
When an interrupt or system exception occurs:
- The processor automatically stores a context on the stack.
- This includes general-purpose registers, the processor status register (xPSR), return address (PC), and link register (LR).
- After the ISR (Interrupt Service Routine) ends, the stack is popped, and execution resumes as if nothing happened.
Stack in ARM Cortex-M4: Behind the Scenes
Main and Process Stack
Cortex-M4 has two types of stack pointers:
Stack Pointer | Usage |
---|---|
MSP | Main Stack Pointer – used by system/interrupts |
PSP | Process Stack Pointer – used by user-level code/tasks (in RTOS) |
By default, only MSP is used in bare-metal applications. In RTOS-based systems, PSP is used for thread execution.
Stack Overflow: A Word of Caution
- Stack has a fixed size (set in linker script or startup code).
- If too much data is pushed (e.g., recursive calls, large local variables), the stack can overflow into other memory areas and cause crashes.
- This is why it’s important to allocate enough stack space and use tools to monitor usage.
Example: What Happens During a Function Call
Here’s what typically happens when a function is called:
- Return address is saved on the stack.
- Local variables are allocated on the stack.
- Registers may be saved to preserve state.
When the function returns:
- Registers are restored.
- Local variables are removed (stack pointer moves back).
- Execution continues from the saved return address.
Summary
Feature | Description |
---|---|
Location | Internal or external RAM |
Access Style | Last In, First Out (LIFO) |
Accessed With | PUSH/POP or LD/STR instructions |
Traced Using | Stack Pointer (SP/R13) |
Stack Growth | From higher to lower memory addresses |
Used For | Function calls, interrupts, local variable storage |
Stack Pointers in Cortex-M4 | MSP (default), PSP (used in RTOS) |
What is RAM used for in Embedded Systems?
Imagine RAM like a shelf in your workspace. Different parts of it are used for different tasks:
- Global Data Area:
- This section stores global variables and static local variables.
- It is available throughout the entire program.
- Think of it like a drawer where you keep important items that you’ll use again and again.
- Heap:
- This area is used for dynamic memory allocation (e.g., using
malloc
in C). - The size of data here is not fixed — it grows during the program’s execution when needed.
- Think of it like a box where you add things as you go.
- This area is used for dynamic memory allocation (e.g., using
- Stack:
- This is used during function calls.
- It stores temporary data like:
- Function parameters
- Local variables
- Return addresses
- Interrupt frames
- Imagine it like a stack of books. You add a book on top when you go into a function, and remove it when the function is done.
How Does the Stack Work in ARM Cortex Mx Processors?
ARM Cortex Mx processors use a Full Descending (FD) stack model.
Let’s break that down:
- Full: The memory address it points to is already in use.
- Descending: The stack grows downwards in memory (from higher addresses to lower addresses).
🧠 Imagine your stack is a set of boxes on a shelf, and every time you add a new box, you place it below the last one, and it already has stuff inside.
📌 Different Types of Stack Operation Models
There are 4 common stack models based on how they grow and whether the top pointer points to an empty or full location:
Stack Type | Meaning |
---|---|
Full Ascending (FA) | Stack grows up, and the top points to a full slot. |
Full Descending (FD) | Stack grows down, and the top points to a full slot. (Used by ARM Cortex Mx) |
Empty Ascending (EA) | Stack grows up, and the top points to an empty slot. |
Empty Descending (ED) | Stack grows down, and the top points to an empty slot. |
📌 Summary for ARM Cortex Mx:
- Uses Full Descending (FD) stack.
- Stack grows from high memory to low memory.
- Top of the stack points to a valid (already used) memory location.
What is Stack Placement?
When a program runs on a microcontroller (like ARM Cortex-M), memory (RAM) is divided into different parts:
- Data section (for global/static variables),
- Heap (for dynamically allocated memory),
- Stack (for temporary data in functions),
- And the unused area.
The stack grows in a specific direction depending on how the memory is managed in the system.
📌 Two Types of Stack Placement (as shown in the top image)
✅ Type 1 – Stack at the Lower Memory Side
- Data section is placed at the bottom.
- Heap grows upwards (towards higher memory addresses).
- Stack grows downwards (from high to low memory addresses).
- Stack and heap are placed opposite to each other to avoid collision.
🧠 Think of it like:
Heap starts from the bottom and grows up.
Stack starts from the top and grows down.
✅ Type 2 – Stack at the Top (High Address Side)
- Stack starts at the top of RAM and grows down.
- Heap grows up from below the stack.
- Still prevents overlap between heap and stack.
🧠 This is common in embedded systems like ARM Cortex-Mx.
Changing Stack Pointer (SP) to PSP in Thread Mode
Let’s understand the bottom part of the image.
🔹 What is SP (Stack Pointer)?
- It’s a register that always points to the top of the current stack.
🔹 What are PSP and MSP?
- MSP (Main Stack Pointer): Used by default after reset and in Handler mode (interrupts, exceptions).
- PSP (Process Stack Pointer): Used in Thread mode (normal application code), for better memory separation.
🔄 Why Change SP to PSP?
- To separate interrupt stack (using MSP) from application stack (using PSP).
- Helps avoid accidental stack overflow between ISR and user code.
🔁 Summary of SP Switching:
Mode | Stack Pointer Used |
---|---|
Handler Mode (interrupts) | MSP |
Thread Mode (application) | PSP (you can switch to it manually) |
You can switch from MSP to PSP in thread mode using a special register (CONTROL register).
📦 Stack Region Mapping (from image):
STACK_MSP_START
↓ (MSP stack grows down)
---------------------
(shared stack space)
---------------------
↑ (PSP stack grows up)
STACK_PSP_END
MSP and PSP can even share the same 1KB stack memory, but they must not overlap at runtime.
In Simple Terms:
- Stack placement can vary, but it always grows down in ARM Cortex-M.
- Stack and heap are placed in opposite directions to avoid clash.
- ARM Cortex-M uses MSP by default but can switch to PSP in thread mode.
- Separating PSP and MSP improves reliability, especially in RTOS or multitasking systems.
Code Example: Switching from MSP to PSP
This code assumes you are writing for a bare-metal ARM Cortex-M environment.
#include <stdint.h>
// Allocate memory for PSP (Process Stack Pointer)
#define PSP_STACK_SIZE 0x100 // 256 bytes
__attribute__((aligned(8))) uint8_t psp_stack[PSP_STACK_SIZE];
void switch_to_psp(void) {
// Step 1: Calculate the top address of the PSP stack
uint32_t psp_top = (uint32_t)(psp_stack + PSP_STACK_SIZE);
// Step 2: Load the PSP (R13) with this address
__asm volatile("msr psp, %0" :: "r" (psp_top));
// Step 3: Change CONTROL register to use PSP in thread mode
// Set bit 1 of CONTROL register to 1 (use PSP), and bit 0 to 0 (Privileged mode)
__asm volatile(
"mov r0, #2 \n" // CONTROL.SPSEL = 1
"msr control, r0 \n"
"isb \n" // Instruction Synchronization Barrier
);
}
int main(void) {
// Initially MSP is used after reset
// Switch to PSP for thread mode
switch_to_psp();
while (1) {
// Application code using PSP
}
}
Explanation:
Step | What it Does |
---|---|
psp_stack | Allocates memory (256 bytes) for the process stack. |
psp_top | Points to the top of the stack (stack grows downward). |
msr psp, %0 | Moves the value to the PSP register. |
mov r0, #2 | Sets bit 1 of the CONTROL register to 1 to use PSP. |
msr control, r0 | Applies the new CONTROL settings. |
isb | Ensures the CPU uses the new stack pointer immediately. |
Output (Expected Behavior):
- After running
switch_to_psp()
, your application (thread mode) will use the PSP instead of the default MSP. - Interrupts and exceptions will still use MSP, which is safer and prevents corruption.
1. Physically Two Stack Pointers
Cortex-M processors have two separate stack pointer registers:
- MSP (Main Stack Pointer)
- PSP (Process Stack Pointer)
These are hardware registers, not just variables in RAM.
2. Main Stack Pointer (MSP)
- Default stack pointer after reset
- Used by:
- Interrupt handlers (exceptions, faults)
- System-level code
- Thread mode if PSP is not enabled
- MSP is automatically loaded from the first word of the vector table on reset.
3. Process Stack Pointer (PSP)
- Used only in thread mode
- Useful for user-level tasks or application code
- Typically used in RTOS-based applications to give each task its own stack
4. Switching Between MSP and PSP
- By default, the CPU uses MSP.
- You can switch to PSP in Thread mode using special instructions.
5. Changing or Accessing Stack Pointers
- Use assembly instructions:
MRS
— Read MSP/PSPMSR
— Write MSP/PSP
Example (in ARM assembly):
MRS R0, MSP ; Read MSP into R0
MRS R1, PSP ; Read PSP into R1
MSR MSP, R2 ; Write R2 to MSP
6. Changing Stack Pointer in C Code
- Use naked functions (no prologue/epilogue), e.g. in GCC:
__attribute__((naked)) void switch_to_psp(uint32_t pspValue) {
__asm volatile (
"MSR PSP, r0 \n" // Set PSP
"MOV r0, #0x02 \n"
"MSR CONTROL, r0 \n" // Switch to PSP
"ISB \n"
"BX LR \n"
);
}
Summary Table
Feature | MSP | PSP |
---|---|---|
Default After Reset | ✅ Yes | ❌ No |
Used in Handler Mode | ✅ Yes | ❌ No |
Used in Thread Mode | ✅ Yes (by default) | ✅ Yes (if configured) |
Common Usage | System/Interrupt code | Application tasks/RTOS tasks |
Switch Using | CONTROL register (bit 1) | CONTROL register (bit 1) |
AAPCS
Imagine you’re building with LEGOs, and your friend is building another part of the same model separately. For your LEGO pieces to fit together perfectly, you both need to follow the same set of instructions, right? The Procedure Call Standard for the Arm Architecture (AAPCS) is like that set of instructions, but for computer programs running on ARM-based devices (like many smartphones and embedded systems).
What is AAPCS and Why Do We Need It?
In simple terms, AAPCS is a standard or a set of rules that defines how different pieces of code, called subroutines or functions, “talk” to each other.
Think about a program as a team working on a project. One team member (a function) might need another team member (another function) to do a specific task. When the first function “calls” the second function, they need a clear agreement on how to pass information, who is responsible for what, and how to hand back the results.
The AAPCS makes sure that functions written by different people, or even compiled by different tools, can work together seamlessly. Without it, it would be like one LEGO builder using round pegs and another using square holes – things just wouldn’t connect!
What Does the AAPCS Define? The Contract Between Functions
The AAPCS sets up a “contract” between the function that makes the call (the caller) and the function that gets called (the callee). This contract includes:
- Obligations on the Caller:
- The caller must set things up in a specific way before the called function can start. This includes putting any necessary data (called arguments or parameters) in agreed-upon places (like specific processor “slots” called registers).
- Obligations on the Callee (the Called Routine):
- If the called function needs to use certain resources that the caller was also using (like some of those processor “slots” or registers), it must save the caller’s original values before using them.
- Before it finishes, it must restore those saved values so the caller isn’t surprised by unexpected changes.
- Rights of the Callee:
- The called function has permission to use certain resources and change certain parts of the program’s state to do its job.
A Peek into the Rules: Registers in ARM
One of the key areas AAPCS defines is how registers are used. Registers are small, super-fast storage locations within the ARM processor.
According to the AAPCS (as mentioned in your provided information):
- Registers R0, R1, R2, R3: These are often used to pass arguments to a function and to return a result from a function. A function can generally modify these registers without needing to save their previous values.
- Register R14 (LR – Link Register): This special register holds the “return address” – it tells the function where to go back to in the caller’s code once it’s finished. A function can modify this (for example, if it calls another function itself).
- PSR (Program Status Register): This holds information about the current state of the program (like if the last calculation was zero). A function can modify this.
- Registers R4 to R11: These are considered “callee-saved” or “preserved” registers. If a function wants to use any of these registers, it must first save their current contents (e.g., push them onto a temporary storage area called the stack). Before the function finishes and returns to the caller, it must restore these registers to their original values. This ensures that the caller’s context is not disturbed.
Who Follows These Rules?
When a ‘C’ compiler (a tool that translates human-readable C code into machine code for the ARM processor) generates code, it must follow the AAPCS specification. This ensures that the compiled C functions can correctly call other functions, including those that might be part of the operating system or other libraries, which also adhere to AAPCS.
In Summary
The AAPCS is a fundamental agreement that allows different parts of an ARM program to communicate and cooperate effectively. It defines:
- How functions call each other.
- How data (arguments and return values) is passed.
- Which registers can be used freely and which must be preserved.
By having this standard, developers can write modular code, use libraries from different sources, and be confident that everything will work together smoothly on ARM-powered devices. It’s a crucial part of the Application Binary Interface (ABI) that makes the ARM ecosystem robust and interoperable.
Here is a cleaned-up and structured explanation of stack activities during interrupts/exceptions and stack initialization tips, especially in the context of ARM Cortex-M processors:
Stack Activities During Interrupts and Exceptions
When an interrupt or exception occurs in an ARM Cortex-M processor:
What Gets Automatically Pushed to Stack
The processor automatically saves the following registers on the current stack (usually MSP unless PSP is configured for the thread):
- R0 – R3: Argument and temporary registers
- R12: Intra-procedure-call scratch register
- LR: Link register (holds the return address)
- PC: Program counter (implicitly saved for return)
- xPSR: Program status register
This is done to preserve the CPU state before jumping to the interrupt handler.
Why This is Done
This mechanism ensures that when the handler completes and performs an exception return, all the saved registers can be restored by hardware, and the processor can resume execution exactly from where it was interrupted.
Implication for C Handlers
Because the registers are preserved, C functions can safely be used as interrupt handlers without needing manual assembly code for saving/restoring state.
Stack Initialization Tips
Proper stack configuration is essential in embedded systems to avoid hard faults, memory corruption, or unpredictable behavior.
1. Estimate Stack Size
- Analyze the worst-case stack usage of your application: recursive functions, deep function calls, RTOS tasks, etc.
- Use stack usage analysis tools or test under high load.
2. Understand the Stack Growth Model
- Full Descending (FD) – Common in ARM (stack grows down, points to valid data)
- Others: Full Ascending (FA), Empty Descending (ED), Empty Ascending (EA) — Know which one applies to your CPU.
3. Choose Stack Placement
- Decide where to place the stack:
- At end of internal RAM (typical)
- In external memory (SDRAM) if needed
- Avoid placing near heap or globals unless separated by enough guard space.
4. Two-Stage Stack Initialization
- In systems with external SDRAM:
- Start with stack in internal RAM
- Initialize SDRAM in startup code or
main()
- Then switch stack pointer (
MSP
) to use SDRAM
5. Vector Table and MSP Initialization
- The first word in the vector table must contain the initial value of the Main Stack Pointer (MSP).
- Startup code sets this up before calling
main()
.
6. Configure Linker Script
- Use your linker script to:
- Define stack start and size
- Place symbols like
_estack
,_stack_size
, etc.
- The startup code reads these symbols to initialize the stack pointer.
7. RTOS Considerations
- RTOS kernel uses:
- MSP for system-level tasks (e.g., ISRs, scheduler)
- PSP (Process Stack Pointer) for user threads
- Make sure to switch to PSP in thread mode to separate kernel and user stack.
Summary Table
Topic | Key Point |
---|---|
Registers auto-saved | R0–R3, R12, LR, xPSR |
Stack for exceptions | Handled by hardware (MSP/PSP) |
Stack placement | Internal RAM or SDRAM |
Vector table initialization | First entry = Initial MSP |
Linker role | Defines stack boundaries |
RTOS stack model | MSP = kernel, PSP = user tasks |
Stack Growth
In most embedded systems (especially ARM Cortex-M), the stack grows downward — that is, toward lower memory addresses.
Push (Grow)
When a function is called or data is pushed:
- SP (Stack Pointer) is decremented.
- New data is stored at the new lower address.
Example:
Suppose the stack starts at 0x20001000
:
Address | Value |
---|---|
0x20001000 | ← Initial SP |
0x20000FFC | R0 |
0x20000FF8 | R1 |
0x20000FF4 | R2 |
- Pushing 3 registers caused the SP to move from
0x20001000
to0x20000FF4
.
Stack Shrink
When a function returns or values are popped:
- The SP is incremented.
- This frees up the memory that was used.
Pop (Shrink)
Continuing from the example:
Address | Value |
---|---|
0x20000FF4 | ← SP after popping |
Visual Summary
High Address ↑
(Empty area)
| ← stack shrinks (pops)
↓
+----------+ ← Stack Top (SP)
| Data |
+----------+
| Data |
+----------+
| Data |
+----------+
↑
| ← stack grows (pushes)
Low Address ↓
- Stack grows down (toward low addresses) when pushing.
- Stack shrinks up (toward high addresses) when popping.
On ARM Cortex-M:
- MSP or PSP is used depending on context (handler mode or thread mode).
- Stack pointer is always aligned to word boundaries (4 bytes).
- During interrupts, the CPU pushes context automatically (stack grows).
- On return from interrupt, the CPU pops context (stack shrinks).
Common Stack Operations in C
Example Function Call:
void foo() {
int a = 5; // local variable -> stored on stack
bar(); // function call -> return address pushed to stack
}
What Happens in Stack:
- Space allocated for
a
- Return address pushed
- Stack grows
- On function return, stack shrinks (SP adjusted back)
Stack Overflow / Underflow
- Overflow: Stack grows beyond its allocated space → corrupts other memory
- Underflow: Trying to pop from an empty stack → unpredictable behavior
Key Points
Action | Effect on SP | Direction |
---|---|---|
Push | Decrement | Down |
Pop | Increment | Up |
Function Call | Push return addr + locals | Stack grows |
Return | Pop return addr | Stack shrinks |
Leave a Reply