## GCN Kernel Machine State This chapter is describing the state of the machine compliant with GCN 1.0/1.1/1.2. Table with available registers: Name | Long name | Size | Description -----|-----------|------|----------------------- PC | Program counter | 40 bits | Current instruction address in memory V0-V255 | VGPR | 32 bits | Vector general purpose register S0-S103 | SGPR | 32 bits | Scalar general purpose register (GCN 1.0/1.1) S0-S101 | SGPR | 32 bits | Scalar general purpose register (GCN 1.2) S0-S103 | SGPR | 32 bits | Scalar general purpose register (GCN 1.4) LDS | Local Data Share | 32 kB | Local Data Share memory (R/W) EXEC | Execute Mask | 64-bits | One bit of that mask control execution for one lane EXECZ | Execute Is Zero | 1 bit | Set if EXEC mask is zero VCC | Vector Condition Code | 64-bits | Bit mask with bit per lane VCCZ | VCC Is zero | 1 bit | Set if VCC is zero SCC | Scalar Condition Code | 1 bit | Condition code for scalar operations FLAT_SCRATCH | Flat scratch address | 64 bits | The base address of scratch memory (GCN 1.1 or later) XNACK_MASK | Address Trans. Failure | 64 bits | Bit indicates failure of address translation. Carrizo APU only STATUS | Status | 32 bits | Read-only status register MODE | Mode | 32 bits | R/W mode register M0 | Memory Register | 32-bit | Additional register that used in various cases TRAPSTS | Trap Status | 32 bits | Holds information about exceptions and pending traps. TBA | Trap Base Address | 64 bits | Pointer to current trap handler program TMA | Trap Memory Address | 64 bits | Temporary register for shader operations. TTMP0-TTMP11 | Trap Temporary SGPRs | 32 bits | SGPRs only to the Trap Handler for temp. storage. VMCNT | VM Instruction Count | 4 bits | Counts the number of not completed VM instructions EXPCNT | Export Count | 3 bits | LGKMCNT | LDS, GDS, Kmem, Message Count | 5 bits | Counts the number of LDS, GDS, K mem and message instrs. ### Initial vector registers First three vector registers holds local ids for each dimension. ### Scalar registers layout The user data registers hold execution setup (global offset, pointers, arguments pointers, the same arguments). User data can allow to pass any constant data to kernel from host. The register 1-5 bits of PGM_RSRC2 indicates how many first scalar registers hold user data. Further scalar registers store group id and it are different for every wavefront. Number of that registers determined from number of enabled dimensions (fields TGID_X_EN, TGID_Y_EN and TGID_Z_EN in PGM_RSRC2). Next scalar registers is TG_SIZE value and scratch buffer wave offset (for handling scratch buffer). Last allocated SGPR's are VCC, FLAT_SCRATCH and XNACK_MASK, depending on GCN architecture. Following table is depicting layout of SGPR's: First register | Number of registers | Description -----------------|-------------------------------|---------------------- SGPR0 | number of user data registers | User data registers next SGPR | number of enabled dimensions | Group Id next SGPR | 1 if TGSIZE_EN enabled | TGSIZE next SGPR | 1 if SCRATCH enabled | Scratch wave offset SGPR[N-6] | 2 registers | FLAT_SCRATCH (GCN 1.2) SGPR[N-4] | 2 registers | XNACK_MASK (GCN 1.2) or FLAT_SCRATCH (GCN 1.1) SGPR[N-2] | 2 registers | VCC Note: N - number of allocated SGPR's. ### STATUS Register Table of fields for STATUS Register: Bits | Name | Description --------|-----------|------------------------------------ 1 | SCC | Scalar condition code 1-2 | SPI_PRIO | Wavefront priority set by SPI while creating wave 3-4 | WAVE_PRIO | Wavefront priority set by the shader program 5 | PRIV | Privileged mode 6 | TRAP_EN | Indicates that trap handler is present 7 | TTRACE_EN | Indicates whether thread trace is enabled for this wavefront 8 | EXPORT_RDY | ... 9 | EXECZ | Set if EXEC is zero 10 | VCCZ | Set if VCC is zero 11 | IN_TG | Set if workgroup is greater than one wavefront 12 | IN_BARRIER | Set if wavefront waiting for barrier 13 | HALT | Wavefront is halted or scheduled to halt 14 | TRAP | Wavefront will be entered to trap handler as soon as possible 15 | TTRACE_CU_EN | Enables/disables thread trace for this compute unit (CU) 16 | VALID | Wavefront is active 17 | ECC_ERR | An ECC error has occurred 18 | SKIP_EXPORT | ??? 19 | PERF_EN | Performance counters enabled for this wavefront 20 | COND_DBG_USER | Conditional debug indicator for user mode 21 | COND_DBG_SYS | Conditional debug indicator for system mode 22 | ALLOW_REPLAY | Indicates that ATC replay is enable 23 | INST_ACC | ??? 24-26 | DISPATCH_CACHE_CTRL | Indicates the cache policies for this dispatch 27 | MUST_EXPORT | ??? ### MODE Register Table of fields for STATUS Register: Bits | Name | Description --------|-----------|------------------------------------ 0-3 | FP_ROUND | Set round modes for single and double precision 4-7 | FP_DENORM | Set denormal mode for single and double precision 8 | DX10_CLAMP | Treat NaNs as In DX10 mode (used by vector ALU) 9 | IEEE | IEEE mode ??? 10 | LOD_CLAMPED | Sticky bit for LOD clamping 11 | DEBUG | Forces the wavefront to jump to exception handler 12-18 | EXCP_EN | Enable mask for exceptions 27 | GPR_IDX_EN | GPR index enable (only for GCN 1.2) 29-31 | CSP | Conditional branch stack pointer The single floating point rounding mode is controlled by 0-1 bits in MODE register. A rounding mode for double precision and half precision is controlled by 2-3 bits. List of possible values: Value | Description -------|--------------------------------- 0 | Nearest to even 1 | +Infinity 2 | -Infinity 3 | Toward zero The denormal mode for single precision controlled by 4-5 bits in MODE register. The 6-7 bits of MODE register controls denormal mode for double precision and half precision operations. List of possible values: Value | Description -------|--------------------------------- 0 | flush input and output denormals 1 | allow input denormals, flush output denormals 2 | flush input denormals, allow output denormals 3 | allow input and output denormals The initial value of FP_ROUND and FP_DENORM fields (first 8 bits in MODE register) can be given by including .floatmode pseudo-operation. ### GPR indexing mode (GCN 1.2) The GCN 1.2 introduces the GPR indexing mode that facilitate usage of indexing in VGPR's. The bit 27 in MODE register indicates whether this mode is enabled. The M0 register holds index and mode of GPR indexing. If this mode will be enabled then this index will be added to index of specified VGPR used in vector instruction. The mode specifies to which operand of vector instruction a GPR index will be added. If sum of GPR index and VGPR register index beyond last available VGPR register or this is not a VGPR register (SGPR or other), then operand register will be substituted by V0 register. The lowest 8 bits of M0 register holds the GPR index. The 12-15 bits holds GPR indexing mode. The GPR indexing mode bits table: Bit | Description ----|-------------------------------------- 0 | Apply GPR indexing to VSRC0 operand 1 | Apply GPR indexing to VSRC1 operand 2 | Apply GPR indexing to VSRC2 operand 3 | Apply GPR indexing to VDST operand