# Implementation of the GNU Assembler

Jul 30, 2020

## Concepts

### Sections and Relocation

• Assigning run-time addresses to sections is called relocation.
• An object file written by as has at least three sections, any of which may be empty. These are named text, data and bss sections.

• text section and data section
• hold the program
• text section is often shared among processes: it contains instructions, constants and the like.
• data section of a running program is usually alterable: for example, C variables would be stored in the data section.
• bss section
• contains zeroed bytes when your program begins running
• used to hold unitialized variables or common storage
• was invented to eliminate those explicit zeros from object files
• absolute section
• addresses that do not change when relocating
• undefined section
• a catch-all for address references to objects not in the preceding sections.

#### bss Section

• allocate address space in the bss section
• may not dictate data to load into it before your program executes

### Symbols

• Labels: represents the current value of the active location counter
• Local Symbol Names: the first 1: is named L1C-A1, the 44th 3: is named L3C-A44.
• The special symbol . refers to the current address that as is assembling into.
• Symbol Attributes:
• Value
• for a symbol that labels a location in the text, data, bss or absolute sections the value is the number of addresses from the start of that section to the label.
• the value of a symbol changes as ld changes section base addresses during linking.
• Absolute symbols’ values do not change during linking.
• The value of an undefined symbol
• If it is 0 then the symbol is not defined in this assembler source file, and ld tries to determine its value from other files linked into the same program.
• A non-zero value represents a .comm common declaration. The value is how much common storage to reserve, in bytes (addresses). The symbol refers to the first address of the allocated storage.
• Type
• contains relocation (section) information, any flag settings indicating that a symbol is external, and (optionally), other information for linkers and debuggers.
• The exact format depends on the object-code output format in use.

### Expressions

• Empty expressions
• Integer Expressions
• arguments delimited by operators
• Arguments are symbols, numbers or subexpressions.

## Syntax

### Preprocessing

• Adjust and remove extra white spaces
• Convert the chars into numeric representations.

• C-style , /* ... */
Ends with \n or @ (or others).
 1 2 3 4 5  .byte 74, 0112, 092, 0x4A, 0X4a, 'J, '\J # All the same value. .ascii "Ring the bell\7" # A string constant. .octa 0x123456789abcdef0123456789ABCDEF0 # A bignum. .float 0f-314159265358979323846264338327\ 95028841971.693993751E-40 # - pi, a flonum.