Assembly Tutorial

Posted by Andy at 7:53 PM

Learning assembly is a major pain in the ass because there is little documentation on the subject compared to higher level languages like C++ or Java. Furthermore, different assemblers (Things that turn your assembly code into programs) have different ways of doing things, making switching between assemblers a difficult task.

To make things even more tricky, many assembly tutorial writers choose to write their tutorials in a manner which means the examples will only function on a certain operating system. To break from the mold, this tutorial gives examples that should function on Windows, OS X, Linux, BSD, Solaris and any other. In fact, you would have to be using a pretty strange system if these examples didn't work. This tutorial will also point out when parts of the code may vary on other popular assemblers.

So, in light of the lack of *easy to understand* documentation on the internet, I have decided to start writing a massive assembly tutorial, split up into many manageable sections for everyone to read. I have even targeted this tutorial at people who can't even program full stop. This tutorial isn't designed as a reference, later parts build off content from previous parts, so you should read all the parts in the right order.

This part (Part 1) will discuss why you would want to learn assembly and why I choose the options I choose in the this tutorial. I will also discuss how to use assemblers, which will lead nicely into Part 2, which is where I will show you how to write a "Hello World" code. The "Hello World" code is the classic code that people learning how to program write, and it just displays the phrase "Hello, World!" on the screen when it is run. I know you want to write a "Hello World" code right away, but before we can do this, we need to get a little info, which is what this part is all about.

I will also teach you how to use C functions (Features in the C language that gives you the ability to achieve certain things), but I do *not* assume you know C, so I will tell you exactly what the functions are doing.

As I said before, all the assembly examples on my tutorial will be portable, you should be able to assemble them on Windows, OS X, and *nix. That is why you won't see unportable stuff like stand-alone system calls (System calls are a way assembly can do things, but I won't be using them in this tutorial, because they are not portable. But, I will be teaching how to use 'system call variables' in a later part of this tutorial, which are system calls that are portable. But in the earlier parts, I will just use C functions which are also portable, and a lot easier).

Don't worry if you don't know what 'C functions' or 'System call variables' are, I will explain all of this later on!

Just to say, I haven't copied and pasted off a site, this is completely my work.

Why Assembly

Why the hell would anyone learn assembly when excellent languages like C++ and Java exist? Why would a n00b want to learn assembly before any other language? Well, there are a few reasons, both for and against. Here are a few....

Against Assembly -
> Takes ages to write a simple code.
> Different across operating systems, assemblers and platforms.
> Very easy to crash programs and even damage hardware.
> Little support and documentation.
> Fewer job opportunities than higher level languages.

For Assembly -
> Blazingly fast programs.
> You can reverse engineer any program without the source code (The code from what the program was built from).
> You can use C functions and system call variables to make code portable, these features will be explained throughout the tutorial.
> You can optimize compiled code generated from C, C++, Parcal etc compilers (Compilers turn higher level code like C++ into a working program).
> You can access powerful capabilities in hardware, which is something higher level languages couldn't dream of.
> You can use inline assembly (Assembly code embedded into your higher level code) to speed up trouble spots in higher level languages.
> You can avoid using capabilities exclusive to certain hardware to make your code, again, portable.
> It will be very easy to learn a higher level language once you know assembly, and you will have a solid understanding about how higher level languages actually work.

Which one?

There are many different types of assembly, there are different ways of doing the same thing. In my opinion, there are a few assemblers you should know about.

MASM -
Microsoft's assembler. This is only for Windows and is not maintained as a individual product anymore, but it is included with Visual Studio.NET. It uses the Intel Syntax (See more about syntaxes below). There is a another project called MASM32, which is non Microsoft, but I know little about this project...

NASM -
The Netwide assembler. It can be installed on many operating systems, including Windows, OS X and of course, *nix. It uses the Intel syntax.

Gas -
The GNU assembler. It is available on many operating systems including Windows, and it's on *nix and OS X by default on many versions. It uses the AT&T syntax by default. It can assemble code for so many platforms, including SPARC, x64 etc.

Other assemblers include FASM (Flat Assembler), YASM (Don't know it's acronym) and SOL_ASM (Solar Assembler).

When I say 'Intel syntax' or 'AT&T syntax', I mean how the language looks. A line of code that performs a certain task may look different from one syntax to another, even though they are both assembly. For example there may be certain symbols in one syntax that aren't used in the other. One may use keywords, whilst the other may use a different keyword.

The one we will be using is Gas, which uses the AT&T syntax. This is because it's the assembler that GCC uses, which means it will be easy to integrate C functions into it. So many people use GCC these days, it seems Gas is a good choice. Gas supports loads of hardware, more than NASM and MASM. It works very well with many GNU tools that are useful for assembly, like the profiler, debugger and compiler. The profiler allows us to check the speed of programs. The debugger is let's us go through our programs step by step. The compiler is used to assemble code that uses C functions (Because it's just so much easier than using the assembler). I doubt I will be covering these tools in my tutorial, except the assembler and the compiler. This tutorial is about teaching assembly, not how to use GNU assembly tools, since there is enough documentation on that subject.

The unfortunate thing about Gas, is that it uses AT&T syntax rather than Intel. This means it will be different to use Gas than NASM or MASM. But once you've learned assembly, it shouldn't be that hard to learn the differences. Major differences between them will be pointed out throughout the tutorial.

How to assemble stuff

Before I give you the example code in Part 2, I need to tell you how to assemble your code. Firstly, you will be writing your code into a text file. You can use Notepad (I wouldn't recommend it though) for Windows, OS X's text editor, or Gedit in Linux. You don't need to do anything special, you just type your code into it and save it. But here is the difference... when you save it you need to put .s at the end of the file name. This is called a file extension, and it's used to identify the file type to the operating system. .s means assembly source for Gas. Other assemblers like MASM use the .asm extension instead.

For example here is a good file name for some assembly code.

Code:

really_cool_computer_program.s

Here is a file name that isn't good....

Code:

really_cool_computer_program.txt.s

The file name can have spaces, but it's more easier to just use underscores instead. This avoids confusion when you tell Gas to assemble your code, which leads us to the next bit...

Using Gas is quite simple. It needs to be done in the command line (Don't worry, no command line knowledge needed, though it helps). Here is what you do....

Open the command line. It's in Start>All Programs>Accessories>Command Prompt on Windows. On OS X, it's in Application/Utilities/Terminal. On Linux/BSD it's...well.... everywhere.

Now you're in the evil command line, let's use Gas. Window's users will have to actually find the as.exe file on their system (Use the search tool) and manually type the entire location into the command line. OS X and *nix users on the other hand, should only need to just type 'as' into the command line.

Note that the ~ symbol means your Home directory in OS X/*nix.

So, you first type the program in. On Windows it would look something like this (I'm using the the Gas included with popular Dev-C++ compiler, but you could use any) -

Code:

c:\dev-cpp\bin\as.exe

On *nix/OS X, you only have to type this - as
Now, put a space after that. Now you have to type the address of the assembly code you want to assemble -
On Windows -

Code:

c:\dev-cpp\bin\as.exe c:\documents and settings\user\desktop\assembly_source.s

On *nix/OS X -

Code:

as ~/Desktop/assembly_source.s

Don't hit enter yet....

Now, that's all well and good, but we need to tell the assembler where to dump the 'object code' it creates using your assembly code. Object code is the 'half way' point between a assembly source and a finished program. Notice in the next example how the second address is the same file, but it ends with .o rather than .s? The output should always end with .o. Now, add another space and add '-o' and put a space after this. Then you write the address of the place where you want to put the finished.

On Windows it would be like this -

Code:

c:\dev-cpp\bin\as.exe c:\documents and settings\user\desktop\assembly_source.s -o c:\documents and settings\user\desktop\assembly_source.o

And on *nix/OS X

Code:

as ~/Desktop/assembly_source.s -o ~/Desktop/assembly_source.o

Now see that file with .o at the end of it? That's the file that will be converted to a finished program using a process called 'Linking'. You don't need to know how this works right now, just accept that it works =)

Linking a file is the same process as before, but with a few modifications....
firstly, as is replaced with ld. And now the file that ends with a .o is the first address, not the second. The second address is the finished program. Here is a example in Windows -

Code:

c:\dev-cpp\bin\ld.exe c:\documents and settings\user\desktop\assembly_source.o -o  c:\documents and settings\user\desktop\assembly_source.exe

And on OS X/*nix -

Code:

ld ~/Desktop/assembly_source.o -o ~/Desktop/assembly_source

Note that a Windows program has a .exe extension, whilst a *nix file doesn't have one at all. I'm not sure about OS X though, I think it's .app.

Assembly code is in sections

In a assembly program's source code, there are many things. In assembly, the code is divided into multiple sections, which are also known as segments. Each section serves a purpose. Before we can learn about what the bits and bobs of assembly do, we need to know the two basic sections.

They are text and data. In other assemblers, the text section may be called the code section.

So let's start laying out our first assembly source code.

Here is the syntax of starting a section in Gas assembly.

Code:

.section <section name>

So, you tell the assembler it's a section by starting with a .section. Why does it start with a dot? This will be answered later.

The data section has only one name. data. That's profound, huh? So, in the next example, I will declare a section like before, but give it the name data.

Code:

.section .data

Yeah, the name starts with a dot as well. Now, let's make another section. I talked about the text section before, so let's try this. Using the same method as before, I created the text section.

Code:

.section .text

I have shown you how to create the text section, and the data section. Now, let's combine them into the same file. I will put a empty line between them, to make it easier on the eye. It will make the code more readable when more code is there.

Code:

.section .data

.section .text

So, we now have two sections in our file. But what do they do?

The data section

Right. To make a remotely useful program, we need to create something called a variable. A variable stores numbers, strings, and characters. Technically, strings are just a bunch of characters. And even more technically, characters are just numbers that make the ASCII code system. The ASCII code system is a system that basically sets numbers to represent characters, because computers don't have a clue what characters are, only numbers. I'll discuss this more other parts.

A variable has a name. It has a type. It stores something. Here is a *really* lame example, lol.

Code:

I am a variable. I can only store numbers without a decimal place. My name is "A". I store the number 12.

A number that stores only whole numbers is a int, or an integer. In assembly it' called a int. And unsurprisingly, like the section stuff, you start it with a dot. This variable can't store numbers without a decimal place, so the number 12 is acceptable, but 2.4 is not, and won't work. We know this, because we now know it's an int. Here is an example of declaring an int -

Code:

.int

But storing a int is useless, if we can't get it to do stuff. We get it to do stuff with it by using it's name. So lets give it a name. Let's call it "A".

Code:

A: .int

Yep, a ends with a colon. If you called it "Hello_Everyone", it would be in the assembly code "Hello_Everyone:". This is just a part of the Gas assembly. So, it's an int that's called "A". Assembly is case sensitive, so "A" is a completely different variable to "a". Now, we've got this far, so let's make this variable store a number -

Code:

A: .int 12

Now, remember that this needs to be in the data section, because we are 'making' data be making variables. So, let's put it in the data section. Note the tab before the line... this makes it look more neat. The text section is here, because it has to be in assembly programs for them to do something.

Code:

.section .data
    A: .int 12

.section .text

So, we just made a variable. How about we make more than one? It's the same idea, but we put it on multiple lines, keeping to the tabs.

Code:

.section .data
    A: .int 12
    B: .int 4
    C: .int 20

.section .text

So, in the above example, we made three variables, all in the data section. We can now use these variables in our code.

Variables are stored in memory for the program to access. There is another way of storing things - CPU registers, which will be discussed in a brief moment.

The text section

This is where the programs' instructions go. In this section, you can manipulate variables, jump across different sections of the code, and do other cool things. But before we can discuss instructions, we need to know about CPU registers and memory.

CPU registers

There are a load of these things. To avoid unnecessary trauma, I'll only cover 4 of them at first. Presenting...

Code:

eax, ebx, ecx and edx

They store stuff for you. SImple as that. They also store stuff for other things as well, but that's for later.

In Gas assembly, when referring to CPU registers, you prefix them with the % symbol, like so -

Code:

%eax

windows 8 download free Software

0 comments:

Post a Comment

Search

About Me

Labels

Blog Archive