Every four months, we on the BlackBerry® QNX® OS kernel team have a new group of co-op students (AKA Interns) come in to help us out with kernel and safety-related work. Over the last four years, we've had anywhere from six to 14 people per term, from a range of schools and with diverse sets of courses and previous co-op terms under their belts.
A few years ago, to establish a common foundation and reference, we put together a training course for our new co-ops. This training course is in addition to the standard programming with QNX® training we offer all new technical hires. Over time, we’ve expanded this course and the materials we provide to fill in gaps and address feedback, as well as to bring in discussions of new technology that is becoming more commonly used.
As word of the training we were offering to the co-ops joining the kernel team got out to other teams at BlackBerry QNX—and once we felt the training was pretty mature—we began inviting co-ops from other teams to join, so, and now every term begins with around 30 co-ops going through the training, and usually some full-time people too!
Safety First
Here at BlackBerry QNX, we design and build functionally safe software for critical embedded systems used in everything from medical devices to high-speed trains, so we always start our training with an introduction to safety and the idea of a safety culture. This includes emphasizing that a co-op’s job description includes asking questions, not only during training, but throughout their time here. “I think I understand” and “It seems to work most of the time” in fact means that there is a risk something will not work—and safety is all about risk management.
As a pilot and a ground school instructor, I particularly like to briefly discuss aviation history and how accidents in the past led to the adoption of a safety culture, how successful fostering safety culture has been in that industry, and how recent accidents from the headlines emphasize that a safety culture requires constant vigilance, introspection, and improvement.
My colleague, Chris Hobbs, who is also a pilot and ground school instructor, recently published a post on this blog in which he cites C. A. R. Hoare warning of the dangers of poor programming. If you’re worried about threats to our very survival, look no further than your own code.
The Curriculum
On the more strictly technical side of things, over six days of training we cover hardware, computer architecture, advanced topics in C, including some language law (the formal language used in the specification of programming languages), POSIX, operating system concepts, the architecture of QNX systems, and finally, the main tools we use for development, including for revision control, code review and debugging. We provide a mix of theory and practical, real-world examples and use cases, because co-ops can expect to start right away applying what they learn to solve real problems.
The View from Below
Our overall goal is as much—in fact, more—to bring about a change in co-ops’ perspective as it is to instruct them how to program for BlackBerry QNX systems. For more than 20 years, introductions to computing have tended to start from the top down; that is, starting with a web page or GUI, then leading down to eventually arrive at the hardware. This approach has meant that our co-ops view things in terms of how hardware makes a web page happen. We flip this view on its head so that the co-ops start looking at things from the bottom up. We try to show them what hardware can do, then make it clear to them that the next step is up to them: It’s up to them to use their imagination to come up with what they want the hardware to do, and to use their brains to find a way to make it do exactly that.
This bottom-up perspective is also a good one for those interested in security and safety, because once you know what hardware can do, and how it does it, you also understand the things it could be made to do. For example, we discuss the architecture of static and dynamic RAM, its implications for performance, power and price, then mention how Row hammer works.
Putting C in Context
Since we are training our co-ops to contribute to our systems and not just how to code some isolated widgets, we strive to ensure that we make clear the connections that exist between the different components that ultimately make up these systems, from the silicon up. For example, we discuss the role of the arithmetic logic unit and how it reflects the results of its most recent operation in the status flags. One flag, the Z, or Zero Flag, indicates if the last result was zero or not zero—that’s it, nothing else. After we discuss logic gates, co-ops easily see how trivial it is to implement flags of this sort.
This simple but fundamental discussion sets the stage for further discussion of how hardware has influenced the C standard, where many things are defined in terms of all bits are zero, or not all bits are zero.
We also spend quite a bit of time emphasizing that memory is not just about how many bytes are needed. Since systems are designed to run; that is, they are temporal, they change, it is crucial to think about memory not only in terms of physical size but to also focus on:
- How long the memory is needed (object storage duration)
- Who needs access to the memory (identifier scope, and linkage)
Similarly, we challenge our co-ops to rethink the colloquial terms “local variable” and “global variable.” These terms are commonly used when teaching programming and are useful, but they are simplifications, lacking the sort of specifics we need to understand what they refer to and how they should and shouldn’t be used.
What, we ask, is the difference between these two variables inside and outside of a function?
int a;
static int b;
If we return to our common denominator: building systems that are functionally safe and secure from malicious interference, we can use this question to emphasize how by restricting variable (memory) lifespans and access as much as possible, we can establish spatial isolation, wherein unrelated activities are kept far apart from each other.
Isolation
Since spatial isolation is crucial to any robust system and since it requires cooperation between software and hardware, we discuss the purpose of the hardware Memory Management Unit (MMU), which leads to discussing the POSIX mmap() function underpinning most memory management in POSIX. An understanding of the address space mappings an MMU provides, and the spatial isolation it enforces, also nicely leads to a discussion of how an IOMMU (SMMU) can provide spatial isolation when direct memory access (DMA) devices are implemented, as it can for guests running in hypervisor virtual machines.