Introducing BlackBerry QNX Boot Camp

EMBEDDED SYSTEMS / 11.10.21 / Michael Brown

Every four months, we on the BlackBerry® QNX® OS kernel team have a new group of co-op students (AKA Interns) come in to help us out with kernel and safety-related work. Over the last four years, we've had anywhere from six to 14 people per term, from a range of schools and with diverse sets of courses and previous co-op terms under their belts.

A few years ago, to establish a common foundation and reference, we put together a training course for our new co-ops. This training course is in addition to the standard programming with QNX® training we offer all new technical hires. Over time, we’ve expanded this course and the materials we provide to fill in gaps and address feedback, as well as to bring in discussions of new technology that is becoming more commonly used.

As word of the training we were offering to the co-ops joining the kernel team got out to other teams at BlackBerry QNX—and once we felt the training was pretty mature—we began inviting co-ops from other teams to join, so, and now every term begins with around 30 co-ops going through the training, and usually some full-time people too!

Safety First

Here at BlackBerry QNX, we design and build functionally safe software for critical embedded systems used in everything from medical devices to high-speed trains, so we always start our training with an introduction to safety and the idea of a safety culture. This includes emphasizing that a co-op’s job description includes asking questions, not only during training, but throughout their time here. “I think I understand” and “It seems to work most of the time” in fact means that there is a risk something will not work—and safety is all about risk management.

As a pilot and a ground school instructor, I particularly like to briefly discuss aviation history and how accidents in the past led to the adoption of a safety culture, how successful fostering safety culture has been in that industry, and how recent accidents from the headlines emphasize that a safety culture requires constant vigilance, introspection, and improvement.

My colleague, Chris Hobbs, who is also a pilot and ground school instructor, recently published a post on this blog in which he cites C. A. R. Hoare warning of the dangers of poor programming. If you’re worried about threats to our very survival, look no further than your own code.

The Curriculum

On the more strictly technical side of things, over six days of training we cover hardware, computer architecture, advanced topics in C, including some language law (the formal language used in the specification of programming languages), POSIX, operating system concepts, the architecture of QNX systems, and finally, the main tools we use for development, including for revision control, code review and debugging. We provide a mix of theory and practical, real-world examples and use cases, because co-ops can expect to start right away applying what they learn to solve real problems.

The View from Below

Our overall goal is as much—in fact, more—to bring about a change in co-ops’ perspective as it is to instruct them how to program for BlackBerry QNX systems. For more than 20 years, introductions to computing have tended to start from the top down; that is, starting with a web page or GUI, then leading down to eventually arrive at the hardware. This approach has meant that our co-ops view things in terms of how hardware makes a web page happen. We flip this view on its head so that the co-ops start looking at things from the bottom up. We try to show them what hardware can do, then make it clear to them that the next step is up to them: It’s up to them to use their imagination to come up with what they want the hardware to do, and to use their brains to find a way to make it do exactly that.

This bottom-up perspective is also a good one for those interested in security and safety, because once you know what hardware can do, and how it does it, you also understand the things it could be made to do. For example, we discuss the architecture of static and dynamic RAM, its implications for performance, power and price, then mention how Row hammer works.

Putting C in Context

Since we are training our co-ops to contribute to our systems and not just how to code some isolated widgets, we strive to ensure that we make clear the connections that exist between the different components that ultimately make up these systems, from the silicon up. For example, we discuss the role of the arithmetic logic unit and how it reflects the results of its most recent operation in the status flags. One flag, the Z, or Zero Flag, indicates if the last result was zero or not zero—that’s it, nothing else. After we discuss logic gates, co-ops easily see how trivial it is to implement flags of this sort.

This simple but fundamental discussion sets the stage for further discussion of how hardware has influenced the C standard, where many things are defined in terms of all bits are zero, or not all bits are zero.

We also spend quite a bit of time emphasizing that memory is not just about how many bytes are needed. Since systems are designed to run; that is, they are temporal, they change, it is crucial to think about memory not only in terms of physical size but to also focus on:

How long the memory is needed (object storage duration)
Who needs access to the memory (identifier scope, and linkage)

Similarly, we challenge our co-ops to rethink the colloquial terms “local variable” and “global variable.” These terms are commonly used when teaching programming and are useful, but they are simplifications, lacking the sort of specifics we need to understand what they refer to and how they should and shouldn’t be used.

What, we ask, is the difference between these two variables inside and outside of a function?

int a;

static int b;

If we return to our common denominator: building systems that are functionally safe and secure from malicious interference, we can use this question to emphasize how by restricting variable (memory) lifespans and access as much as possible, we can establish spatial isolation, wherein unrelated activities are kept far apart from each other.

Isolation

Since spatial isolation is crucial to any robust system and since it requires cooperation between software and hardware, we discuss the purpose of the hardware Memory Management Unit (MMU), which leads to discussing the POSIX mmap() function underpinning most memory management in POSIX. An understanding of the address space mappings an MMU provides, and the spatial isolation it enforces, also nicely leads to a discussion of how an IOMMU (SMMU) can provide spatial isolation when direct memory access (DMA) devices are implemented, as it can for guests running in hypervisor virtual machines.

Figure 1. A diagram from a blog post by my colleague, Randy Martin, about how our smmuman service works with the hardware IOMMU/SMMU to contain DMA device access to memory.

We stress to our co-ops that memorizing the C standard is not an approach that scales. Limiting their code to the safe areas of the language will simplify their lives, however. In short, the dragons of undefined behavior are easily avoided, and MISRA C[^1] is especially useful.

Wrapping Up

Our first full day of training covers hardware and computer architecture, and at the end of day we bring all the ideas together by simulating a computer. It’s a very simple computer, but a computer that has enough parts to serve as a good learning exercise: CPU, MMU, system bus, RAM, flash, and a UART. In this simulation, everyone in the room gets a role to play, and with 30 people, we can be very fine-grained with the parts.

One person will be the level 1 instruction cache, one person will be the MMU page table walker, one the MMU TLB—that’s a translation lookaside buffer, but of course you knew that.

If we have enough people, we’ll have a TLB hierarchy too, all the way down to someone acting as the chip select for a UART, which is of course, yet another role. In a large meeting room encircled with whiteboards each person stands by a part of a whiteboard, with the state of the hardware they represent on the whiteboard beside them, visible to everyone else. We provide an initial state for everyone (coming out of reset), then let the simulation run.

“Hello World!” at Last

This, of course, is when the questions start: “It all made sense when we were talking, but why did we have to wait for him before she could complete her request?” The proof, as they say, is in the pudding; or in our case, the computing.

Rather than simply answering questions, we hand them to the group to work through them. This approach usually leads to some refinements of both of the particular tasks we’re trying to complete with our simulation and of our co-ops’ understanding of how hardware and software work together. Starting out, things usually go rather slowly, but as participants absorb and stitch together the concepts involved in the fetching and execution of the first instruction (and as caches warm up), the execution of the second instruction goes much more smoothly. The code being executed is a short and simple loop, and we typically get through the loop once, which causes an "H" to be written to the UART's transmit register, but it’s easy to see from there how "ello, World!\n" would follow.

Unfortunately, COVID-19 has paused our “Hello, World!” simulation, since we can’t safely gather in one room, and we do need to be together to be able to all see and discuss the system states and state changes together. I do look forward to resuming it soon, though.

From Training to Mentoring

Training doesn’t end with the end of the training week. We continue with weekly co-op meetings to cover more material in greater depth, and we ask co-ops to prepare and give presentations on an area they’ve been working on. These are opportunities for them to gain experience with presentations and public speaking, useful skills even for the best coders. In most cases, co-ops will have a week or more to prepare, though in some instances we’ve asked a co-op to present on the spot. In the real world, it’s not uncommon for a programmer to be asked without warning: “Can you explain to the customer how this works?”

No, we’re not trying to embarrass anyone. On the contrary, we want to offer everyone the opportunity to practice among people they know, in a friendly atmosphere, where mistakes are learning opportunities.

Teaching as a means of learning is a lesson I learned at my first job out of university, from my first mentor, Adrian Zissos, who, shortly after I’d started working, ask me to prepare some training on the database software we used in all our products:

Michael: “Uh, I've been using it less than a month. I don't know it inside out.”
Adrian: “You will by the time you've done the presentation.”

Based on anecdotal evidence, I’m also going to claim that our approach to training and mentorship plays a role in many co-ops returning to BlackBerry QNX and other BlackBerry divisions, not only for further co-op terms, but also as full-time employees. Four members of the current BlackBerry QNX kernel team are previous kernel co-ops, and I’ve lost count of how many co-ops who have been through our training have returned to work here full-time.

And, yes, we are hiring. Look for BlackBerry QNX under Student Opportunities.

About Michael Brown

Senior Software Developer QNX Core/OS, BlackBerry QNX

Michael is an electrical engineer who for the last 30 years has worked on petroleum economics software, custom CPUs, virtual machines, digital imaging devices, and operating systems.

Back