|The Lazy Beginning Reverser's Guide to Windows Assembly
| Introduction |
Before you begin =), try to understand what hexadecimal (hex) is. If you're
THAT lazy and don't want to until later, suffice to say, just think of them
as numbers, but don't try to add/subtract/multiply/divide them, because your
answer will probably be wrong. This type of number is distinguished from
your everyday number (decimal) because it will have an 'h' at the end of it
Don't read the Extra Notes if you don't feel like it. They aren't
essential... not really =).
Semicolons are the symbol that tells the assembler (the thing that puts
together a program written in Assembly) not to bother with anything past it.
So, on the same line, everything after the ';' is ignored and is known by
programmers as a 'comment'.
| Part 1: Registers, Flags, The Stack |
Think of registers as a fancy term for 32-bit (it's just a size, calm down if
you don't understand, it doesn't matter all that much at this point)
variables. You use them exactly as you would use any other variables, to
store values that you will need later on. For those who aren't quite as lazy
as some others, here is a quick run-through of the registers that you will
see while reversing a Windows32 program (as opposed to a DOS one):
(This can be skipped if you are that short on time)
General Purpose Registers:
EAX - Commonly used in mathematical operations
EBX - Commonly used as a pointer (if you don't know what this is, don't
worry, it doesn't matter that much right now)
ECX - Commonly used as a looping variable (eg. it stores the value '5' if a
loop needs to run 5 times)
EDX - Similar to EBX
Registers that you should NOT touch if you don't know what you're doing:
CS - 'Code segment'. Basically tells you where you are in memory (think of
it as part of the address that tells you where in your computer the
program is stored)
DS - 'Data segment'. Same as above, but tells you where the data is stored
ES - 'Extra segment'. Ask someone else what the hell this does, I've never
messed with it =).
SS - 'Stack segment'. (See Above)
ESI - 'Source Index'. (See Above)
EDI - 'Destination Index'. (See Above)
EBP - 'Base Pointer'. (See Above)
ESP - 'Stack Pointer'. (See Above)
EIP - 'Instruction Pointer'. I know this one =). It holds the address of the
By now you should realize that I don't know everything about Assembly. Why?
Because I understand your laziness much better than you might think =).
The 'E' in front of some of the register names just indicates it is a 32-bit
version of the register to distinguish it from their 16-bit counterparts
that were used in DOS and other 16-bit processors. Each of the General
Purpose Registers can be broken down into parts. For example, 'EAX's lower
half is 'AX'. 'AX' can be divided into 'AH' and 'AL'. AH is the 'high' half
of AX, and AL is the 'low' half of AL.
Doesn't deserve a whole Part since I'm not saying much about it because you
don't mess with these much anyway. The only thing I want to point out is that
in Debuggers, aside from the program code and values of the abovementioned
registers, there is also a set of single letters that are (depending on what
you use, it'll be represented differently) either highlighted or not
highlighted. These are 'flags' that basically store TRUE or FALSE. The most
important one to know is the Zero flag. It's obviously the 'Z'. It's use will
be seen when jumps are explained later.
The stack is basically your computer's warehouse. It can be used to store
anything. Why do people bother with registers? Because they're a lot faster
and easier to deal with. In order to properly introduce the stack, I will
need to tell you about your first two assembly instructions. Push and Pop.
Learn to love them, or at least tolerate them because you'll be seeing a LOT
of them. The stack, as it's name implies, operates like a stack of plates.
Information gets Push'ed on the stack of plates and if you want to get
information off of the stack, you Pop it off the top. As you have probably
realized, it is a FILO (first in last out) scheme, meaning that information
that gets pushed onto the stack first will be the last to get popped off.
(if you get the point, skip the example)
Example: push 1h ; put the value '1' on the stack
push 2h ; put the value '2' on the stack
push 3h ; put the value '3' on the stack
pop eax ; take off the top value (3) and put it into
pop ebx ; take off the top value (now it is 2,
; because 3 got taken already) and put it
; into ebx
pop ecx ; take off teh top value (now 1) and put it
; into ecx
| Part II: Basic Assembly Instructions |
This is a list of the most common Assembly instructions that a reverser at
your current level (which should be far below NEWB) will need to understand.
If you deal with tougher programs, they might use some weird functions,
which is why you should grab yourself a handy reference guide when you get
around to it. Find it yourself you lazy bum. We have already discussed two
instructions, Push and Pop, so we begin from there:
MOV - um...it moves stuff around =).
Example: mov eax, 203h
This line moves the value 203 hex into the EAX register.
Mathematically it looks like this:
X = 203h
where X is just the EAX register. If you still don't understand, go
back to school.
CALL - it calls the function located at the given address.
Example: call 03828549 ; This is what it will look like in
; a disassembly.
; When actually programming in
; Assembly, tags can be used instead
; (eg. call my_function).
This line calls the function at the address 03828549.
*Extra note: If this line of code were located at address 02955555,
this address would be stored on top of the stack to let the program
know where to go back to.
JMP - unconditional jump, it jumps the program to whatever address it is told
to, skipping everything between.
Example: jmp 00300000 ; assume this line is located at
push eax ; thus this line is at 002XXXXX (yes
; i could tell you where exactly,
... ; but you don't need to know). It is
; not executed because of the jump
push ebp ; If this instruction is at address
; 00300000, it gets executed after
; the jump.
CMP - compare two values. Essentially subtracts the 2nd from the first and
throws away the result. BUT! It sets a TON of flags that are almost
always used in conjunction with a conditional jump of some type. If the
two values are equal, then the subtraction will equal zero and the Zero
flag will be set (will be TRUE). The carry, overflow, sign, and a
couple other flags may get set/unset as a result of the cmp function as
well, so many different types of conditional jumps can be used after a
JNZ/JZ - conditional jumps, two of many many possible types (eg. JG (greater
than), JL (less than), JGE, JNG, etc.) these two are more common,
Jump if not Zero and Jump if Zero. These check the zero flag which
was mentioned earlier. If the zero flag is set, then the JZ jump
gets taken and the JNZ does not. Very important for any
reverser/cracker because it dictates the flow of the program and
often comes after a cmp of two values (say, maybe the serial number
you enter and the one that it's supposed to be? =P). If the jump is
not taken, the program simply continues on with the next instruction
after the jump.
RET - return, tells the program that it has completed a routine (function)
and needs to return to where it came from.
For more complete information about the above topics, search around for
Cruehead's Assembly tutorials on the internet. They're good...they're just a
lot longer than mine =).
| Part III: The Windows API Functions |
Every Windows program (um, I think) makes use of the Windows API functions at
some level. These are a bunch of functions that do all the useful stuff you
would need to do in a Windows program (ie. get a string, make a message box,
etc.). In a disassembly of a program, you will see many calls to API
functions. As for any call, the parameters (fancy word for 'inputs') for the
function are first push'ed on to the stack. So it would look something like
Notice that the 2nd input is pushed and THEN the first input is pushed. This
is because things pushed onto the stack get put on top of one another and we
want the top item on the stack to be the first input when we call the
function. So while trying to understand a portion of code, pay attention to
why things are being push'ed on to the stack because chances are you will
need to know what values are getting thrown around by the program (say,
perhaps if a function takes two inputs, one being your name and the other
your company, what do you think will happen?). I'd highly suggest you get
yourself an API reference so that you can look up any unfamiliar API
functions. It will help greatly when trying to actually understand a program
and you forget what the 2nd input of the MessageBoxA function is supposed to
The following is a list of some of the more cracker-oriented API functions
(not reverser-oriented because a reverser needs to know them all if he/she
wants to understand a program completely, which is the difference between the
HWND hWnd, // handle of window or control with text
LPTSTR lpString, // address of buffer for text
int nMaxCount // maximum number of characters to copy
hWnd - Identifies the window or control containing the text.
lpString - Points to the buffer that will receive the text.
nMaxCount - Specifies the maximum number of characters to copy to the buffer.
If the text exceeds this limit, it is truncated.
For those unfamiliar with the C programming language, the above bit is a
function definition for the GetWindowText function. The 'int' right before
the GetWindowText indicates that the function will return an integer value
after it is completed. This translates into plain english as 'GetWindowText
hands whoever called it some number'. The three lines between the parenthases
are the inputs that the function will take. Each line is one type of input
and the HWND is the TYPE of input while the hWnd is just the name that the
function has decided to use for the input. The same goes for the rest of it.
This function is used to, duh, get text from a window. The 'handle' of a
window is kind of like the name that Windows uses to recognize the window
('window' is, in this case, equal to 'program'). In your disassemblies, this
will be some address that is assigned to the program at the very beginning of
the program. When simply cracking, you don't really need to worry about this,
just know that it is the last parameter that will be pushed right before the
call to this function. The most important of these parameters is the second
one. This is the address that the retrieved string will be stored in.
Here is an example:
push 00000016 ; max length of 16h characters (22 characters)
push 00493938 ; retrieved string will be stored at address 00493938
push 00482638 ; handle of window
So now you know that the string it is retrieving will be at address 00493938
and that any future reference (unless it gets replaced with something else)
to this address will be referring to the text that was retrieved by the
HWND hDlg, // handle of dialog box
int nIDDlgItem, // identifier of control
LPTSTR lpString, // address of buffer for text
int nMaxCount // maximum size of string
hDlg - Identifies the dialog box that contains the control.
nIDDlgItem - Specifies the identifier of the control whose title or text is
to be retrieved.
lpString - Points to the buffer to receive the title or text.
nMaxCount - Specifies the maximum length, in characters, of the string to be
copied to the buffer pointed to by lpString. If the length of the
string exceeds the limit, the string is truncated.
This function is used to do the same as above, but under different
circumstances. Go find out the difference between the two yourself if you are
that hardworking, it has to do with the difference between a 'window' and a
HWND hWnd, // handle of owner window
LPCTSTR lpText, // address of text in message box
LPCTSTR lpCaption, // address of title of message box
UINT uType // style of message box
hWnd - Identifies the owner window of the message box to be created.
If this parameter is NULL, the message box has no owner window.
lpText - Points to a null-terminated string containing the message to be
lpCaption - Points to a null-terminated string used for the dialog box title.
If this parameter is NULL, the default title Error is used.
uType - Specifies a set of bit flags that determine the contents and behavior
of the dialog box. This parameter can be a combination of flags from
the following groups of flags.
I have cut off the 'groups of flags' because it was a long list. This
function creates a message box. You see these all the time and they are often
used to make the annoying 'you have 30 days left in your trial' messages that
pop up for shareware. The last parameter (uType) basically tells what buttons
are in the message box. So sometimes you see a message box with Abort, Retry,
and Ignore. Other times you see a box with Ok and Cancel. etc. etc. That's
what this parameter does. If you look up the MessageBox function in a Win32
Programmer's reference, it will give you a list of the valid values.
| Conclusion |
Well, that's all I have. If you have any complaints, keep them to yourself.
If you have any useful suggestions about what I should add, parts that are
unclear (please be as specific as you can about this), things that you see as
unneeded, any errors you might find (don't bother telling me about spelling
or grammar errors though), etc. email me at vortex168 at asia dot com. Or
post something in the Feedback Forum at Mala's if you're shy. Maybe I'll
write something that makes use of what this tutorial teaches. Maybe. =).
Originally written by vortex168