1.3 Process Management & Scheduling
š Chapter Summary
Key Concepts: Process lifecycle, fork/exec, CFS scheduler, Real-Time scheduling, PREEMPT_RT, priority inversion
What You'll Learn:
- Understand how processes are created (fork/exec) and destroyed
- Explain the difference between CFS and Real-Time scheduling
- Apply real-time priorities to robotics control loops
- Debug process issues using Linux tools
Prerequisites: Chapter 1.1 (Kernel theory), Chapter 1.2 (File systems)
Time Estimate: 55 minutes (40 min reading + 15 min exercises)
1. The Process: A Universe in Isolationā
A process is an instance of a running program. It's a complete universe with:
- Its own memory space (virtual address space)
- Its own file descriptors
- Its own CPU registers
- Its own execution state
Process vs Threadā
- Process: Heavy isolation, separate memory space, expensive to create
- Thread: Lightweight, shared memory space, cheap to create
Robotics Pattern:
- Separate processes for: Vision, Planning, Control (isolation for safety)
- Threads within Control process for: Sensor reading, Motor commands, Safety checks (shared state)
2. The Birth of a Process: fork() and exec()ā
In Unix, there's only ONE way to create a process: fork().
fork() creates an exact copy of the current process. The child process gets:
- A copy of the parent's memory
- A copy of the parent's file descriptors
- A copy of the parent's CPU state
import os
import time
print(f"Parent process PID: {os.getpid()}")
# Fork creates a child process
pid = os.fork()
if pid == 0:
# This code runs in the CHILD process
print(f" Child process PID: {os.getpid()}")
print(f" Child's parent PID: {os.getppid()}")
time.sleep(1)
print(" Child exiting...")
else:
# This code runs in the PARENT process
print(f"Parent created child with PID: {pid}")
print("Parent waiting for child...")
os.waitpid(pid, 0) # Wait for child to finish
print("Parent: Child has exited")
The Magic of fork()ā
fork() returns twice:
- In the parent: Returns child's PID
- In the child: Returns 0
This is how you distinguish parent from child.
exec(): Replacing the Processā
After fork(), the child is an exact copy of the parent. To run a different program, use exec():
import os
pid = os.fork()
if pid == 0:
# Child process
print("Child: About to exec...")
os.execvp('ls', ['ls', '-l']) # Replace child with 'ls' command
# This line never executes (process is replaced)
print("This never prints")
else:
# Parent process
os.waitpid(pid, 0)
print("Parent: Child finished")
The fork/exec Pattern:
fork()- Create child process- Child calls
exec()- Replace child with new program - Parent calls
wait()- Wait for child to finish
This is how your shell runs commands!
3. Process States: The Lifecycleā
A process transitions through several states during its lifetime:
State Descriptionsā
- Ready (R): Process is ready to run, waiting for CPU
- Running (R): Process is currently executing on CPU
- Waiting/Sleeping (S/D):
- S (Interruptible Sleep): Waiting for I/O, can be interrupted by signals
- D (Uninterruptible Sleep): Waiting for I/O, cannot be interrupted (usually disk I/O)
- Zombie (Z): Process has exited, but parent hasn't read exit status yet
- Stopped (T): Process stopped by signal (e.g., Ctrl+Z)
Viewing Process Statesā
import subprocess
import os
# Get current process state
result = subprocess.run(['ps', '-o', 'pid,state,comm', '-p', str(os.getpid())],
capture_output=True, text=True)
print("Current process state:")
print(result.stdout)
# Show all processes with their states
result = subprocess.run(['ps', 'aux'], capture_output=True, text=True)
lines = result.stdout.split('\n')[:10] # First 10 processes
print("\nFirst 10 processes:")
for line in lines:
print(line)
4. The Scheduler: Who Gets to Run?ā
The scheduler decides which process runs next. Linux has two main scheduling classes:
4.1 CFS: Completely Fair Scheduler (Default)ā
CFS tries to give every process an equal share of CPU time. It uses a Red-Black Tree to track which process has been "unfairly" treated.
Algorithm:
- Track how much CPU time each process has used (
vruntime) - Pick the process with the lowest
vruntime(left-most node in tree) - Run it for a time slice
- Update its
vruntime - Repeat
Why CFS is Bad for Robotics:
- Fairness ā Determinism
- A low-priority log compressor can delay your control loop
- No guarantees on when your process will run
4.2 Real-Time Scheduling (SCHED_FIFO, SCHED_RR)ā
Real-Time scheduling gives you deterministic behavior. You assign priorities (1-99), and the scheduler always runs the highest-priority ready process.
SCHED_FIFO (First-In-First-Out):
- Process runs until it blocks or yields
- No time slicing
- Use for: Control loops, sensor reading
SCHED_RR (Round-Robin):
- Like FIFO, but with time slicing among same-priority processes
- Use for: Multiple real-time tasks at same priority
import os
import sched
# Note: Setting real-time priority requires root privileges
# This is a demonstration of the API
def set_realtime_priority(priority):
"""Set current process to real-time scheduling"""
try:
# SCHED_FIFO = 1, priority range: 1-99
param = os.sched_param(priority)
os.sched_setscheduler(0, os.SCHED_FIFO, param)
print(f"Set to SCHED_FIFO with priority {priority}")
except PermissionError:
print("Need root privileges to set real-time priority")
print("Run with: sudo python script.py")
# Get current scheduling policy
policy = os.sched_getscheduler(0)
policy_names = {
0: "SCHED_OTHER (CFS)",
1: "SCHED_FIFO (Real-Time)",
2: "SCHED_RR (Real-Time Round-Robin)",
}
print(f"Current policy: {policy_names.get(policy, 'Unknown')}")
# Attempt to set real-time priority
set_realtime_priority(50)
5. Real-Time Robotics: The Control Loopā
A typical robot control loop runs at 100-1000 Hz. At 1000 Hz, you have 1 millisecond per iteration.
The Problem with Standard Linuxā
import time
def control_loop():
"""Simulated control loop"""
target_hz = 1000
period = 1.0 / target_hz # 1ms
jitter_samples = []
for i in range(100):
start = time.time()
# Simulate control work
sensor_value = read_sensor() # 0.1ms
control_output = compute_control(sensor_value) # 0.2ms
send_to_motor(control_output) # 0.1ms
# Sleep until next period
elapsed = time.time() - start
sleep_time = period - elapsed
if sleep_time > 0:
time.sleep(sleep_time)
actual_period = time.time() - start
jitter = abs(actual_period - period) * 1000 # ms
jitter_samples.append(jitter)
avg_jitter = sum(jitter_samples) / len(jitter_samples)
max_jitter = max(jitter_samples)
print(f"Average jitter: {avg_jitter:.3f} ms")
print(f"Max jitter: {max_jitter:.3f} ms")
if max_jitter > 0.5:
print("ā ļø Jitter too high for stable control!")
# Placeholder functions
def read_sensor(): time.sleep(0.0001); return 42
def compute_control(val): time.sleep(0.0002); return val * 1.5
def send_to_motor(val): time.sleep(0.0001)
control_loop()
Problems:
time.sleep()is not precise (can sleep longer than requested)- Other processes can preempt your control loop
- Jitter (variation in loop timing) causes instability
The Solution: PREEMPT_RT + Real-Time Priorityā
import os
import time
import ctypes
# Load C library for high-resolution timing
libc = ctypes.CDLL('libc.so.6')
class TimeSpec(ctypes.Structure):
_fields_ = [('tv_sec', ctypes.c_long), ('tv_nsec', ctypes.c_long)]
def clock_nanosleep(seconds, nanoseconds):
"""High-precision sleep using clock_nanosleep"""
ts = TimeSpec(int(seconds), int(nanoseconds))
libc.clock_nanosleep(0, 0, ctypes.byref(ts), None)
def realtime_control_loop():
"""Control loop with real-time scheduling"""
# Set real-time priority (requires root)
try:
param = os.sched_param(80) # High priority
os.sched_setscheduler(0, os.SCHED_FIFO, param)
print("ā Real-time priority set")
except PermissionError:
print("ā ļø Need root for real-time priority")
target_hz = 1000
period_ns = int(1e9 / target_hz) # 1ms in nanoseconds
for i in range(100):
start_ns = time.time_ns()
# Control work
sensor_value = read_sensor()
control_output = compute_control(sensor_value)
send_to_motor(control_output)
# Precise sleep
elapsed_ns = time.time_ns() - start_ns
sleep_ns = period_ns - elapsed_ns
if sleep_ns > 0:
clock_nanosleep(0, sleep_ns)
print("Control loop complete")
# Run (will warn if not root)
realtime_control_loop()
Key Improvements:
- SCHED_FIFO - Preempts all non-real-time processes
- clock_nanosleep() - Nanosecond precision sleep
- PREEMPT_RT kernel - Allows preemption even in kernel code
6. Priority Inversion: The Mars Pathfinder Bugā
In 1997, Mars Pathfinder experienced random resets. The cause? Priority inversion.
The Scenarioā
High Priority Task (Control): Needs Resource A
Medium Priority Task (Telemetry): Running
Low Priority Task (Logging): Has Resource A locked
What Happens:
- Low priority task locks Resource A
- High priority task tries to lock Resource A ā blocks
- Medium priority task preempts low priority task
- High priority task is stuck waiting for low priority task
- Priority inversion: Medium priority task effectively blocks high priority task!
The Solution: Priority Inheritanceā
When a high-priority task blocks on a resource held by a low-priority task, temporarily boost the low-priority task's priority.
import threading
import time
# Simulate priority inversion
lock = threading.Lock()
def low_priority_task():
print("Low: Acquiring lock...")
with lock:
print("Low: Got lock, working...")
time.sleep(2) # Simulate work
print("Low: Releasing lock")
def high_priority_task():
time.sleep(0.1) # Let low priority task get lock first
print("High: Trying to acquire lock...")
with lock:
print("High: Got lock!")
# Without priority inheritance, high priority task waits
low_thread = threading.Thread(target=low_priority_task)
high_thread = threading.Thread(target=high_priority_task)
low_thread.start()
high_thread.start()
low_thread.join()
high_thread.join()
print("\nā ļø In real-time systems, use priority inheritance mutexes!")
print("In C: pthread_mutexattr_setprotocol(&attr, PTHREAD_PRIO_INHERIT)")
Robotics Lesson:
- Always use priority inheritance for locks shared between different priority tasks
- Minimize lock holding time in high-priority tasks
- Consider lock-free data structures for critical paths
7. Debugging Processes: /proc and psā
The /proc filesystem is a window into the kernel's process table.
import os
# Explore /proc for current process
pid = os.getpid()
proc_dir = f"/proc/{pid}"
print(f"Process {pid} information:\n")
# Read process status
try:
with open(f"{proc_dir}/status", 'r') as f:
lines = f.readlines()[:15] # First 15 lines
for line in lines:
print(line.strip())
except FileNotFoundError:
print("(Running on non-Linux system, /proc not available)")
# Show command line
try:
with open(f"{proc_dir}/cmdline", 'r') as f:
cmdline = f.read().replace('\x00', ' ')
print(f"\nCommand line: {cmdline}")
except FileNotFoundError:
pass
# Show environment variables (first 5)
try:
with open(f"{proc_dir}/environ", 'r') as f:
env_vars = f.read().split('\x00')[:5]
print("\nFirst 5 environment variables:")
for var in env_vars:
if var:
print(f" {var}")
except FileNotFoundError:
pass
Useful /proc Filesā
/proc/[pid]/status- Process status (state, memory, threads)/proc/[pid]/cmdline- Command line arguments/proc/[pid]/fd/- Open file descriptors/proc/[pid]/maps- Memory mappings/proc/[pid]/stat- Process statistics (CPU time, priority)/proc/[pid]/sched- Scheduling information
Deep FAQā
Q: What's the difference between a zombie and an orphan process?
A:
- Zombie: Process has exited, but parent hasn't called
wait()yet. It's a corpse in the process table. - Orphan: Process whose parent has died. It's adopted by
init(PID 1), which will reap it.
Q: Why does my control loop have jitter even with SCHED_FIFO?
A: Common causes:
- Not using PREEMPT_RT kernel - Standard kernel has non-preemptible sections
- Cache misses - Code/data not in CPU cache
- Interrupts - Hardware interrupts can delay your code
- SMIs (System Management Interrupts) - BIOS-level interrupts you can't control
Q: How do I choose the right real-time priority?
A: Priority hierarchy for robotics:
- 90-99: Emergency stop, safety checks
- 70-89: Control loops (higher frequency = higher priority)
- 50-69: Sensor reading, motor commands
- 30-49: Planning, vision processing
- 1-29: Logging, telemetry
Q: Can I use Python for real-time control?
A: Python has limitations:
- Garbage collection - Can cause unpredictable pauses
- GIL (Global Interpreter Lock) - Limits true parallelism
- Interpreted - Slower than compiled code
Recommendation:
- Use C/C++ for hard real-time control loops (< 1ms jitter required)
- Use Python for soft real-time (< 10ms jitter acceptable)
- Use Python for planning, vision, high-level logic
Exercisesā
Exercise 1: Process Creationā
Write a Python script that:
- Forks 3 child processes
- Each child prints its PID and sleeps for a random time (1-3 seconds)
- Parent waits for all children to finish
- Parent prints total execution time
Exercise 2: Scheduling Policy Comparisonā
Create two CPU-intensive tasks:
- Run both with SCHED_OTHER (default)
- Run one with SCHED_FIFO priority 50, one with SCHED_OTHER
- Measure CPU time each task gets
- Explain the difference
Exercise 3: Real-Time Control Loopā
Implement a control loop that:
- Runs at 100 Hz (10ms period)
- Measures actual loop timing
- Reports average, min, max, and standard deviation of loop period
- Compare with and without SCHED_FIFO
Further Readingā
Next Chapter: 1.4 Networking Fundamentals Related: 1.1 Linux & OS Theory | 1.2 File Systems