光阴冢 赛博空间的自留地

Implementation of the GNU Assembler

Oct 21, 2020  

Concepts

Sections and Relocation

  • Assigning run-time addresses to sections is called relocation.
  • An object file written by as has at least three sections, any of which may be empty. These are named text, data and bss sections.

Linker Sections

  • text section and data section
    • hold the program
    • text section is often shared among processes: it contains instructions, constants and the like.
    • data section of a running program is usually alterable: for example, C variables would be stored in the data section.
  • bss section
    • contains zeroed bytes when your program begins running
    • used to hold unitialized variables or common storage
    • was invented to eliminate those explicit zeros from object files
  • absolute section
    • addresses that do not change when relocating
  • undefined section
    • a catch-all for address references to objects not in the preceding sections.

bss Section

  • allocate address space in the bss section
  • may not dictate data to load into it before your program executes

Symbols

  • Labels: represents the current value of the active location counter
  • Local Symbol Names: the first 1: is named L1C-A1, the 44th 3: is named L3C-A44.
  • The special symbol . refers to the current address that as is assembling into.
  • Symbol Attributes:
    • Value
      • for a symbol that labels a location in the text, data, bss or absolute sections the value is the number of addresses from the start of that section to the label.
      • the value of a symbol changes as ld changes section base addresses during linking.
      • Absolute symbols' values do not change during linking.
      • The value of an undefined symbol
        • If it is 0 then the symbol is not defined in this assembler source file, and ld tries to determine its value from other files linked into the same program.
        • A non-zero value represents a .comm common declaration. The value is how much common storage to reserve, in bytes (addresses). The symbol refers to the first address of the allocated storage.
    • Type
      • contains relocation (section) information, any flag settings indicating that a symbol is external, and (optionally), other information for linkers and debuggers.
      • The exact format depends on the object-code output format in use.

Expressions

  • Empty expressions
  • Integer Expressions
    • arguments delimited by operators
    • Arguments are symbols, numbers or subexpressions.

Assembler Directives

Syntax

Preprocessing

  • Adjust and remove extra white spaces
  • Remove comments.
  • Convert the chars into numeric representations.

Comments

  • C-style , /* ... */
  • the line comment character

Statements

Ends with \n or @ (or others).

Constants

1
2
3
4
5
.byte  74, 0112, 092, 0x4A, 0X4a, 'J, '\J # All the same value.
.ascii "Ring the bell\7"                  # A string constant.
.octa  0x123456789abcdef0123456789ABCDEF0 # A bignum.
.float 0f-314159265358979323846264338327\
95028841971.693993751E-40                 # - pi, a flonum.