Everything you need to know to get started with the assembly

Assembly is the closest thing you can ever come to “talk” to a computer.

Programmable devices have fascinated people for a long time, even before the advent of computers. Two centuries ago we had a music box, a small mechanism that made music encoded as a pin in a cylinder. It didn’t take long for the cylinder to be switched off and new tunes played in the same way. These are not even the oldest programmable devices. Bits can replace pins, hard drives can replace cylinders, and transistors can replace cogs and gears, but the principle of programmable machines remains the same. It was originally a device that was once created, but was able to follow instructions that were probably never imagined when it was created.

Learn assembly tutorials

A music box with interchangeable cylinders for different tunes.

You might wonder what this drive might do with the assembly, so here it goes: programming assembly is like sticking individual pins on a cylinder that will later turn the gears that make up the song. Before assembly, computers had to be programmed by speaking their language directly with 0 and 1 seconds. For example, the sequence 10110111 00001011 may instruct the computer to store the value 11 in a certain portion of its memory. For example, in the example above, the first 4 bits of the instruction (1011) may stand for “move value”, the second set of 4 bits (0111) may have a specific position in memory where the value will be moved. , And finally the last 8 bits (00001011) is the binary encoding of the decimal number 11. The first part of the instruction is called opcode or operation code. It’s like telling a computer to play any note. What the assembly does is translate what the computer is doing to give this machine-speaker a human face which is a bit more understandable.

An assembly of these instructions may represent:

Here ‘ax’ is the name of a memory register of your processor. In a high-level programming language you can write this:

At some level, however, the above assembly instruction is something that the CPU is finally implementing. If you want, you can take a piece of assembly code and look at its equivalent binary machine guide (although usually presented in hex) and you will see exactly what the computer sees when the command is executed. Of course, the examples we gave above are mostly made up. They are not the exact opcode and binary representation for these instructions. In fact, these guidelines are not universal, and neither are their opcodes. There is no single assembly language syntax, since the assembly is closely linked to the architecture of the machine for which it is intended to run. As we mentioned earlier, it is basically a machine language that is placed in English letters and symbols.

Knowledge of assembly therefore requires a knowledge of the machine in which it will run, as opposed to C or Python code which needs to be compiled on different platforms to run there. Part of this is because high-level languages ​​have almost completely replaced assembly code. The language of the assembly is horrible and there is nothing close to that truth.

Learn assembly tutorials

Significance

Whether you use a programming language, the end result is machine code, or assembly language. It is obviously useful to know something about the end result of your code. Think about the difference between knowing how to drive a car and knowing how an engine and other components of a car work. It is possible to know how to drive without knowing anything about spark plugs and brake oil, but understanding how the car works puts you in a better position, especially when the car breaks down. In our case, you can peek into the instructions behind your instructions when you learn assembly while debugging software. Given how low-level assembly language is given, it is also possible to improve performance when coding directly into the assembly. However, with the complexity of the software it is impossible to develop it in assembly today, especially since this type of software will have to be rewritten for each platform that is intended to run it. All that is usually done is to write performance-sensitive parts of high-level code. These are usually sections of code that are called thousands or even millions of times.

For example, a game may have to continuously evaluate the distance between two points. This part of the code can be said thousands of times to render a single frame. It is understandable to write it in assembly and gain a huge boost in the resulting performance. However this code needs to be rewritten for each platform. Speaking of platforms and their differences, there are two major schools of CPU design that also have a major impact on their assembly language. These are Reduced Instruction Set Computing (RISC) and Complex Instruction Set Computing (CISC). The strategy of RISC design is to have less potential guidelines, but to implement them faster. The goal of CISC is less guidelines which are more powerful. For example, a CISC processor may contain instructions to add data from one memory location to data from another memory location. A RISC processor, on the other hand, would require multiple steps, first instructing the CPU to load data from both memory locations into the registers one after the other, and then adding them, and finally instructing them to return this data to memory. It may seem like a good bet here at CISC, isn’t it?

Learn assembly tutorials

You can still purchase an 8085 microprocessor training kit that lets you program by sending a single instruction to the CPU.

But the reality is, the CISC CPU still has to do everything that the RISC CPU will do. The fact that it expresses them as a single instruction does not change the amount of work behind that instruction. The CISC ad operation may eventually take longer than the equivalent of the RISC instruction. The most famous examples of CISC CPUs are Intel and AMD, in other words x86. RISC is well represented by ARM, MIPS, AVR, PowerPC and most other CPU designs. It is another accident in history that most desktop and laptop computers use Intel / AMD CISC CPUs while most mobile platforms use ARM or MIPS RISC CPUs. In fact, a number of the world’s top 500 supercomputers are designed by RISC. The PlayStation 3s cell processor was also a RISC processor. Over time, x86 RISC processors have improved, increasing the number of instructions and their capabilities. Normally run sequences of instructions have been converted to optimized single instructions. Think of buzz words like MMX, 3DNow !, SSE, SSE2, etc. These are all extension sets of the x86 CPU instruction set. Many of these are powerful instructions that can efficiently perform repetitive operations on larger data sets.

Assembling knowledge means you can take advantage of new developments before compiler designers consider them and optimize the code output to use them. In fact some special CPU instructions are only available in the assembly! A programmer who specializes in assembly has a good knowledge of the underlying hardware and it also pays to use high level language. The way the hardware is designed means that often small changes in the code (such as changing the inner and outer loops in a nested loop) can have a huge impact on speed. Why? How? When? These are questions that can only be answered by learning more about machine architecture. Despite the proliferation of high-level languages ​​that cover all the complexities we’ve just talked about, assembly languages ​​still have a lot of value, and it can be very useful to know even if you don’t need to code in it.

Learn assembly tutorials

IBM Sequoia is the 3rd ranked supercomputer in the TOP500 list, and is made up of over 1.5 million RISC PoperPC A2 processor cores.

Oh world

global main

extern printf

section .data        fmtStr:  db ‘hello, world’,0xA,0

section .text

        main:

        sub     esp, 4          ; Allocate space on the stack for one 4 byte parameter

        lea     eax, [fmtStr]

        mov     [esp], eax      ; Arg1: pointer to format string

        call    printf         ; Call printf(3):

                                ;       int printf(const char *format, ...);

        add     esp, 4          ; Pop stack once

        ret​

Attribution

The Hello World program code written for NASM above refers to Linux.
Writing something as simple as the Hello World program can be quite a challenge. The code above simply calls the C printf function a little trick that the bit actually prints on the screen. The rest of the code simply prepares printf to call and passes the ‘hello, world’ message. The ‘section .data’ segment of the code we see above instructs the assembler to save the ‘Hello, World’ memory and place it in ‘fmtStr’. Then the sub (minus) statement, the lea (load effective address) statement and the mov (move) statement put the data and pointers in the right place so that the printf function knows where to find them.
This is by no means the only assembly version of Hello World, not even for NASM.

Sort

isort:

 %define a [ebp + 8]

 %define n [ebp + 12]

 enter 0, 0

 pusha

 mov ecx, 1

 for:

   mov ebx, ecx

   imul ebx, 4

   add ebx, a

   mov ebx, [ebx]

   mov edx, ecx

   dec edx

   while:

     cmp edx, 0

     jl while_quit

     mov eax, edx

     imul eax, 4

     add eax, a

     cmp ebx, [eax]

     jge while_quit

     mov esi, [eax]

     mov dword [eax + 4], esi

     dec edx

     jmp while

   while_quit:

   mov [eax], ebx

   inc ecx

   cmp ecx, n

   jl for

 popa

 leave

 ret

The above code is the insertion selection that has been applied to the assembly’s NASM dialect. The same code applied to a different architecture, such as ARM or MIPS, will look completely different.
Attribution

Tools and Learning Resources

As we have made it abundantly clear, the language of the assembly is highly influenced by the target platform where it will run. The best way to get information about assembly language directly from the source is by the CPU designers themselves. No matter which hardware platform you’re trying to target, just search for the CPU architecture by following the ‘Instructions Set Manual’ and you will find a detailed PDF manual with everything you need to know. For example, search for ‘ARM instruction set manual’ or ‘Intel instruction set manual’. While these manuals are great, they are aimed at most people who already have some knowledge of assembly. Assembly languages ​​are difficult to access, and require a different approach than other languages. First you need to learn about machine architecture and then the actual code. All of this can be hard to figure out on your own. Fortunately, there are many websites available today that offer online, free college course lecture videos, projects, and assignments. One such great course is titled “Hardware / Software Interface” and is available at Coursera.org.

Learn assembly tutorials

Intel’s Software Developer Manuals in 1500 pages is the deepest look you can find in their processor architecture and instruction set.

Another great resource of the free full course in the assembly available here. They have introductory, intermediate and advanced level courses in x86, x86-64, and ARM. Instead of jumping straight into the Core i7, you should start by looking at the programming of the 8-bit Intel 8085 processor. It may be around 40 years old, but it has a design that you can hold in your head at once. From there you can create 16, 32 and finally today’s 64 bit processors. Once you know the assembly language, you can start playing with your own code. For this you will need an assembler, software that converts assembly code into machine code. A popular assembler is Netwise Assembler (NASM) which can run on Windows, OSX or Linux.

The popular Gnu Compiler Collection (GCC) includes support for assembly languages, even compiling high-level code into assemblies, so you can see what your code looks like in assembly. Visual Studio ships with an assembler called MASM (Microsoft Assembler). Another way to interact with assembly code is to dismantle existing code! Its source code is difficult to impossible to compile, but assembly code is easy to find for compiled software. On POSIX platforms like OSX / Linux / Unices you can type objdump -d / path / to / compiled / binary to get assembly representation of compiled code. Lastly, if you want to play with running code and see how it works in assembly, you have GDB probably the most powerful debugging tool. To conclude we would like to mention that assembly is a good place to start on a conceptual level, it should be your last place to get your hands dirty with code. Come back to this article once you are familiar with the higher level languages ​​you follow.

Leave a Reply

Your email address will not be published.