OS Rant - Various thoughts about Linux, Windows, and OS design.

Introduction

I should start this article with an explanation of what it is not.

It is not well-organized. It is not well thought-out.

It is, after all, just a rant. I am writing it at 6 in the morning because I thought it would be fun.

I will share my opinions about OS design, both of existing OSes, as well as the plans for my personal hobby OS.

My history with Operating Systems

Feel free to skip this section, it's just a boring recounting of events.

My first experience with a computer was my dad's Mac. I was too young, so I didn't do much on it, but it was my first experience with a computer.

Next, I got a Windows 7 computer. It was really weak, with only 4 gigs of RAM, and only an integrated GPU. I used it for a few months or so, but around the release of Windows 10, young me really liked the Windows 8.1 calculator app (it could convert joules into units of batteries and bananas!), and so I updated it to Windows 10. That completely killed it, and shortly my dad sent it to a repair shop, which installed Ubuntu on it. They didn't give me the password, so I was not able to update it, and a live USB + chroot was too scary for me back then.

I used that for a bit, then I got an iMac. That one had 8 gigs of RAM, and a better CPU (Intel i5-4460 if I remember right). It lasted me for about a year, but then everything started being sluggish. The browser took minutes to open. And so, I installed Mint on it. This was my first proper experience with Linux. I was learning Java programming then, and that forced me to learn a bit more about the system.

Next, I got a "gaming PC", which came with Windows 10. This was my first proper experience with Windows, too. I tried to daily drive it, which worked for a bit, but I went back to Linux. I jumped distros. Mint, Manjaro, Arch, PopOS...

Then I got a much better computer. A Lenovo Legion, which had a whopping 32 gigabytes of RAM, and an i7-8750H + a GTX 1060M. By then, I was much more experienced with Linux, and daily drove mostly Arch, still jumping around, however.

I switched to NixOS somewhere around then, which I daily drove for until recently.

That computer then broke, and I got another one. An HP Omen, with a 10750H, and a 2070M. The 2070M is currently broken, however the laptop is still being utilized.

In addition, I also got a school laptop, on which I put FreeBSD. The experience was a bit painful at first, but it wasn't as hard. Definitely much a easier start than NixOS. It was certainly much better designed than Linux, but it had compatibility issues. For example, the iGPU on the system, Lucienne, wasn't fully supported.

I also got a travel laptop, on which I tried out Fedora for the first time. It was a horrible experience, as nothing worked OOTB. I spent a few months fixing things, then switched it to Arch.

I also played with TempleOS in a VM. I liked some of the ideas, such as recompiling of the OS at boot, and mixed code-with-data.

About Linux

Currently, I still very much daily drive Linux. I consider it the least bad of the options.

Windows has severe issues with effectively everything. None of the software I use works on it. MacOS requires expensive hardware. FreeBSD has pretty bad software & hardware compat, and overlal lacks devs.

However, by all means, the whole system is a mess. I won't go into detail with all of the problems of each subsystem here, but I will go over some pain points I have with Linux, and then later, how I plan to address them in my hobby OS, ZapsOS.

First, Linux is a monolithic kernel. This means that the whole system is running as a single monolith. Now, it does allow the loading of additional code at runtime, through modules. However, when a driver encounters a fatal error, it brings down the whole system more often than not.

The core of the kernel is still a singular monolithic binary, which cannot be updated while the system is running. There is live patching, but it has severe issues. kexec doesn't count, as it wipes all state. It's just a fast reboot.

Next, the whole kernel is a Unix-like core, and then a bunch of various subsystems glued on top. For each thing you wanna do, there is usually 2-4 subsystems that overlap, which just so happen to implement what you need in usually an extremely cursed way.

Permissions are a nightmare. There are core file permissions, 4 sets of uids and gids, complementary groups. There are many different LSMs (Linux Security Modules). Some are specialized, such as Yama for slightly more granual control of ptrace (The core debug interface. Because there is more than one. Some respect Yama, some don't.) as well as some other unrelated things. There is a LSM for dropping priviledges. There is a LSM for slightly more granual permissions, capabilities. Everyone just shoves things into CAP_SYS_ADMIN anyway. There are two competing LSMs for MAC (Mandatory Access Control). AppArmor and SELinux. RedHat is the only one that knows how to use SELinux. Maybe if you spend reverse engineering Fedora & the kernel for a few months, you will know too. LSMs are the only system that can override the rule that uid0 (root) can do anything. There are namespaces for isolation various kernel structures. There are chroots, which do not actually properly keep track of file structures, so an escape is trivial. Chroot jail is a misnomer. FreeBSD does this SO much better, or so I heard. There are 2 more chroot-like things which are more or less the same. There are userspace systems. Polkit for requesting authentication PAM for modular authentication, in case you want to hook up a DNA scanner to your Linux desktop to authenticate to sudo. Udev assigns permissions to device files based on rules.

Applications, by default, have effectively the same access as the user running them does. This is in contrast to Android, where apps by default have basically no access, or web apps, where apps have access only on case-by-case decisions as made by user, or even WASM, where apps have absolutely NO access to anything that wasn't given to them.

Solution?

A microkernel does not suffer from the same stability issues as a monolithic kernel. A microkernel can be updated while the system is running, by individually restarting driver tasks. By carefully designing the APIs, it is possible to freeze and resume apps to disk, or even serialize their state and migrate them to a new version. One can even not allow a running app access to anything, WASM style, and let the user granually configure what is allowed, denied, or faked. This also helps with debugging, since a task/process is much more malleable.

By carefully designing the APIs such that each app has to declare beforehand what kind of stuff it wants to do and how it should look, including layouts of windows, it is possible to let the user configure much more about the way they interact with apps, and enforce a certain style. This also allows live inspection of the UI, for example. A unified UI system has a lot of advantages. It is also able to have a unified logging API, and other things.

This is precisely what I plan ZapsOS to do.

Hybrid Exokernel

ZapsOS is what I like to call a hybrid exokernel. The initialization is just a simple shim to looks for the scheduler process, and loads up some basic files.

The kernel itself is composed of many separate tasks.

A task might run either in kernel mode, or in user mode. A kernel mode task does not have many security guarantees (though I wish to look into static analysis & restricted memory regions for this), but it has (un)limited direct access to the hardware. It is also fast, since it does not need a context switch to access the hardware.

A kernel task will be utilized for either drivers that need direct access to the hardware (such as a PCIe provider), or a driver that needs fast access to the hardware (such as a GPU driver).

Shared memory will be utilized for communication between both userspace tasks, kernelspace tasks, and in-between userspace and kernelspace.

This avoids context switches. However, not all communication will take place without immediate context switches. Where it makes sense, a context switch will be performed.

IO Objects

On ZapsOS, every single user application is in a special format, which beforehand, using a schema language, declares what IO Objects (IOOs) it will be using. It can either ask for an instance (such as the logger), or ask for permission to create an IOO of a given type (such as a window). Imagine the window IOO as a data-driven UI description. Rather than specifying what the UI looks like, it specifies what the data it is representing is. This goes both for input from the user, such as a color picker, or output to the user, such as a piece of text or an image. More generic IOOs, such as one representing any JSON object, are also possible.

How each IOO is displayed or handled depends on system components.

Another example is classic text printing. On Linux, this is implemented through the write syscall on file descriptor 1. FD0, 1, and 2 are file descriptors opened by the terminal emulator or console, which bind to the pty. 0 is for reading input, 1 is for writing output, and 2 is also for output, but for errors, and does some small things differently (for example, buffering is disabled on stderr/fd2 in libc) Terminals accept complex escape sequences for formatting, interactive applications, etc.

Instead, on ZapsOS, you would define an IOO for a stream of lines of formatted text, and it will act effectively the same as a CLI app.

Even launching a separate thread for processing, or accessing the GPU for acceleration, or accessing the internet... Are all IOOs.

Resource management

Each app needs to define the memory constraints, scheduling priority, max time between preempting, etc that it requires. This ensures that a malicious application, without the right permissions, cannot even overload the system. In addition, system components will always have reserved memory and CPU time. This will ensure that even if the system is overloaded, the system UI will remain responsive.

Linux has big issues with managing resources. A single ninja instance running nproc jobs will easily oom the system, and kill your graphical session. Linux attempts to mitigate this by using an oom killer, out of memory killer. Unfortunately, it kicks in way too late, and almost always murders the wrong process (for example, when ninja was overloading my RAM using 60 gigs of RAM, it killed an instance of vscodium. It also loves murdering Xorg for some reason.) There are projects such as earlyoom that do a similar thing in userspace, however I have had little success with them.

Just to clarify, I have 128 gigabytes of RAM, and 256 gigabytes of swap.

Ghoul processes & Process Management

Linux has what I have dubbed ghoul processes. It's a process that's both alive and dead, but I can't call it a zombie, as that's already a term that means something. A ghoul process is in the disk sleep state (D), which means that it is currently stuck in a syscall. It is named as such since in the olden days, you usually waited for the disk to give you back data, and as such the kernel suspended the process so as to not waste resources in a polling loop.

In Linux, the way you kill a process is by sending a signal to it. This then, when the process is alive, goes to a signal routine in the kernel, which checks the signal handlers the process specified, and either invokes them, or does the kernel default. The SIGKILL signal, aka signal 9, cannot have a custom signal handler, and will always murder the process. Problem is, if the process is in D state, the process will stay in whatever the syscall is stuck on, and will never get to the signal handler. A famous example of this is NFS (Network File System), which will randomly ghoulify processes, since it has issues timing out properly. Another example, that I personally encountered, was when a GPU-accelerated process made a call to the GPU that murdered the driver. If that driver was not also the one driving the display, the system continued to work fine, however any process that was connected to the GPU device now got ghoulified when it tried to call to the GPU. It is also problematic to forcibly free a process, since you might be leaking memory by doing so.

And thus, on ZapsOS, this is solved by having proper process management. The resources attached to a process are kept track of, and the kernel can forcibly free a process. In addition, as explained before, ZapsOS processes/tasks are quite malleable. They can be suspended to disk, serialized & updated, moved across devices over the network, shared between devices, transparent to driver reboots...

Process sharing

Since all resources attached to a ZapsOS process are managed, it is possible to transfer all those resources to another device.

It is also very easy to build a networked cluster with ZapsOS, since ZapsOS abstracts away all resources behind IOOs. IOOs can be made available on a network, and assigned to applications.

State of resources/IOOs can also be synchronized over the network, so one can have a window that exists on two devices at once, that is, two network connected computers can act as two monitors for a singular networked computer.

Devices present on multiple computers can be transparently shared between them.

Resource faking

Malicious applications might refuse to work when not given access to particular IOOs they requested. To address this issue, I plan to provide a suite of tools for faking devices, intercepting traffic between the application and a device, etc. This tooling can also be used for debugging purposes.

Foreign procesees

Since ZapsOS is being designed from scratch, I can make it easy for it to emulate other systems with a compatibiltiy layer. There are two main compatibility layers planned. One of them are Foreign Processes. A foreign process is a process which has custom loading, interrupt, memory, and syscall handling, as well as custom per-process data. This can be used for running raw Linux binaries, just by replicating the Linux APIs. Same should be doable for Windows.

VM Drivers

Another planned compatibility layer is running a stripped-down kernel (probably Linux) in a VM, and letting it own a device, then writing a driver for the uAPI exposed.

This allows me to support a very large amount of devices just by writing a single driver. Performance loss should be negligible (virtualization tech is really good nowadays), though it should still be avoided for high-performance applications.

GPU Muxxing

GPU software muxxing is what I like to call the ability to split a singular GPU into many virtual GPUs. This is really useful for virtualization, as it allows a singular GPU to serve both the host and many VMs.

I will write a software GPU muxxer (since there are hardware systems for it too, in some GPUs), which should have effectively no overhead. All it will do, is just let each VM have a view of the GPU that separates out the other views (host & other VMs) for security reasons, and manage some basic translation & initialization. After all, already, we can run many separated applications on a single GPU, can't we? So all that's needed is to properly implement this for VMs.

This is planned for as early as Zaps/Linux, which is an implementation of some of the ideas for ZapsOS on top of the Linux kernel. Effectively an alternative userspace.