user2979872
user2979872

Reputation: 467

How computer CPU executes a Software Application

I am extending the question asked on What happens when a computer program runs? and from the discussion on Stanford CS101 site Software: Running Programs. CS101 site quotes

The machine code defines a set of individual instructions. Each machine code instruction is extremely primitive, such as adding two numbers or testing if a number is equal to zero. When stored, each instruction takes up just a few bytes. When we said earlier that a CPU can execute 2 billion operations per second, we meant that the CPU can execute 2 billion lines of machine code per second.

A program, such as Firefox, is made up of a sequence of millions of these very simple machine code instructions. It's a little hard to believe that something as rich and complicated as Firefox can be built up out of instructions that just add or compare two numbers, but that is how it works. A sand sculpture can be rich and complicated when viewed from a distance, even though the individual grains of sand are extremely simple.

What I don't understand is how Firefox window or GUI can be translated into simple CPU instructions that just add or compare two numbers? How can one know the actual instructions that CPU executes to bring up the Firefox Window? How about the user search typed into the search bar? What does this translate into CPU instructions?

If Firefox is a complicated example, how about a simple application like a Notepad? Is it actually possible to see the all instructions being executed from running Notepad to typing ABCDEFGHIJ and saving this as test.txt ?

Upvotes: 1

Views: 860

Answers (2)

user123
user123

Reputation: 2884

As a short answer, the Firefox window uses system calls. The syscall instruction makes the program jump in the kernel from user mode. It jumps to an address specified in the LSTAR64 register. The system call can be a call to write to the screen, write to a file, etc. The keyboard itself is polled by the xHC from Intel and, when software (the OS) detects a key was pressed, it will send a message on the message queue of the application which currently has focus. From kernel mode, the different hardware, like the GPU to write to the screen or the xHC to read/write USB devices, will be interacted with using MMIO (Memory Mapped IO) of PCI devices which are DMA. Everything is PCI today. PCI is DMA because it writes in RAM directly. It is also MMIO because, to interact with PCI devices, you simply write to RAM at conventional positions. This allows to read/write some special registers of these PCI devices and to tell them to do stuff (write the key pressed at this position in RAM, make a pixel change color, etc).

The longer answer is rather complex. I'll try to decompose it in smaller pieces. Also, some things I might say are wrong (because I write mostly from head) but I try my best to give factual information. Feel free to correct anything wrong I might say. In this answer I'll take Linux on x86-64 as an example. Things will work similarly on Windows.

Syscalls

The machine code defines a set of individual instructions. Each machine code instruction is extremely primitive, such as adding two numbers or testing if a number is equal to zero.

The x86-64 processors have one of these set of instructions called the instruction set. All x86 processors have mostly 2 manufacturers: AMD and Intel. AMD licenses the x86 architecture from Intel for manufacturing their own processors.

The paging mechanism was introduced later after the first 32 bits processors which, at first, had only segmentation. The paging mechanism allows to set/unset a bit in each page table to determine if the page is supervisor or user. Obviously, a supervisor page cannot be accessed from a user page. This allows to provide security by isolating the kernel from user mode (https://wiki.osdev.org/Paging).

One of the instruction in the instruction set is the syscall instruction which has a specific encoding in binary (which I don't know). Most assembly languages support the syscall instruction that they assemble to the proper binary format.

The syscall instruction makes the processor jump to the address specified in the LSTAR64 MSR (Model Specific Register). This provides a secure mechanism to jump into the kernel from user mode. The kernel will set the address of a specific entry point in this register. The entry point for Linux is the file /arch/x86/entry/entry_64.s. This file is defined in assembly and will call C functions to do the main work.

Each type of call has a number passed in the RAX register and these numbers vary from OS to OS. On Windows, you'll have different numbers for the syscalls (maybe even a different register to pass the syscall number).

In the end, once the syscall instruction was executed, the processor is now in kernel mode code and this code has access to the whole RAM and all IO devices.

Booting

To understand how the GUI is brought up, you need to understand the booting process. Today, computers boot with UEFI. The UEFI standard defines syscalls which are going to be available at boot as some kind of small operating-system. The UEFI firmware thus sets up this small operating-system to allow the OS to set up the computer at boot.

These UEFI system calls allow to read a file from disk, get some ACPI tables, set the graphics mode, get a map of memory, etc. The UEFI firmware has built in drivers to support all the hardware that is present on computers today. This allows to provide a booting interface to the OS to be able to get the file of the kernel from disk without requiring an enormous temporary driver within the bootloader itself.

The OS developers thus provide a UEFI application compiled (in practice) with either EDK2 or gnu-efi. The UEFI application will be compiled to code using the syscalls present during boot to get the kernel's file from disk and then jump to the entry point of the kernel.

The kernel will then take control of everything and set up its own syscall interface.

For Linux, booting is quite involved especially since the advent of systemd. The Linux kernel will start sbin/init as the first process of the computer. In recent distributions, sbin/init is a symlink to systemd. The systemd program will read unit files from disk that are special files which tell systemd what to do and what other processes to start. Among the processes to start, is the main GUI (the desktop) itself.

The X server

The X server is a special program which starts among the first processes of almost any Linux distribution. The X server acts as a local server (can also not be local) to be able to communicate with it using sockets. The socket implementation is present in libstdc++ for use in C++.

The X server also has a library called X11 which defines a set of functions to call that do the main work of communicating with the X server through sockets.

The X server uses the /dev/input/ directory and the character devices present within it to get input from the different input devices.

To write to the screen, the X server makes calls in libdrm which makes syscalls itself. The libdrm library will use a file in /dev/dri/ called card0 or card1 (card0 is the integrated GPU and card1 is the discrete GPU). The library will thus use ioctl calls (https://man7.org/linux/man-pages/man2/ioctl.2.html) on the card* file to control the graphics card directly (http://betteros.org/tut/graphics1.php).

The Mesa3D project has been an attempt at supporting several graphics cards with open source drivers. It failed with NVIDIA since they didn't cooperate. NVIDIA graphics cards have their own closed source drivers which can be installed as modules even while the kernel is running.

These closed source drivers are provided with a library implementation of OpenGL. The X server will thus start to make OpenGL calls within the library to write to the screen once you enable a certain closed source driver. This will also require linking with the glx library. Otherwise, it will use either a framebuffer mode or VESA mode of the graphics card.

Character devices

You probably heard the phrase: everything is a file in Linux. This is due to the virtual filesystem which presents most of the devices to user mode as files. There are several types of virtual files among which are the character devices.

Character devices have open, read, write and ioctl calls. The X server will thus read from a character device to gather input from the different input devices of your system.

Drivers

Read the following: How does loading of kernel module work in linux?

PCI

Your mouse and keyboard today are probably USB. The computer interacts with USB through an eXtensible Host Controller (xHC) that was created by Intel at first. I don't know if AMD makes its own version of the chip or if it buys the chip from Intel.

You can read my answer there for precise info on how it works: https://cs.stackexchange.com/questions/141870/when-are-a-controllers-registers-loaded-and-ready-to-inform-an-i-o-operation/141918#141918

Upvotes: 2

wxz
wxz

Reputation: 2546

As I mentioned in the comments, using a disassembler tool like objdump -d in Linux can help you take a binary/executable file and generate the set of assembly instructions that comprises the entire program.

For instance, if you use objdump -d on notepad.exe (which won't be completely accurate or insightful because objdump is for Linux and Notepad is a Windows program) you will see:

notepad.exe:     file format pei-x86-64


Disassembly of section .text:

0000000140001000 <.text>:
   140001000:   cc                      int3   
   140001001:   cc                      int3   
   140001002:   cc                      int3   
   140001003:   cc                      int3   
   140001004:   cc                      int3   
   140001005:   cc                      int3   
   140001006:   cc                      int3   
   140001007:   cc                      int3   
   140001008:   40 55                   rex push %rbp
   14000100a:   48 8d 6c 24 e1          lea    -0x1f(%rsp),%rbp
   14000100f:   48 81 ec d0 00 00 00    sub    $0xd0,%rsp
   140001016:   48 8b 05 8b 14 03 00    mov    0x3148b(%rip),%rax        # 0x1400324a8
   14000101d:   48 33 c4                xor    %rsp,%rax
   ...

I'm using objdump because I'm on Linux, but as @PeterCordes pointed out in the comments, the assembly instructions should be the same with a Windows disassembler.

The objdump output has more than 43k assembly instructions, so deciphering what each section of assembly does would take forever. This is the entire set of instructions of what Notepad could execute. So if you want to know which assembly instructions are executed and in what order when you do something like type ABC and save it, you would need to use some sort of tracer (e.g. gdb) to step through only those specific executed instructions.

Upvotes: 2

Related Questions