Skip to main content
This page documents every control signal in the 88-bit instruction format, organized by functional group.

System control signals

Bits [0:4] contain five 1-bit control flags:
sys_switch_in
bit
default:"0"
System mode switch - controls whether the TPU is actively processing
  • 1 = System active, computation in progress
  • 0 = System idle
Bit position: [0]
ub_rd_start_in
bit
default:"0"
Unified Buffer read transaction trigger
  • 1 = Start a new read transaction
  • 0 = No read initiated
Bit position: [1]
ub_rd_transpose
bit
default:"0"
Unified Buffer read transpose mode
  • 1 = Transpose data during read (for loading transposed weight matrices)
  • 0 = Normal read without transpose
Bit position: [2]
ub_wr_host_valid_in_1
bit
default:"0"
Host write channel 1 valid flag
  • 1 = Data on channel 1 is valid, write to UB
  • 0 = No valid data on channel 1
Bit position: [3]
ub_wr_host_valid_in_2
bit
default:"0"
Host write channel 2 valid flag
  • 1 = Data on channel 2 is valid, write to UB
  • 0 = No valid data on channel 2
Bit position: [4]

Unified Buffer read control

These fields control data reads from the Unified Buffer:
ub_rd_col_size
2-bit
default:"0"
Number of columns to read from UB
ValueColumns
000
011
102
113
Bit position: [6:5]
ub_rd_row_size
8-bit
default:"0"
Number of rows to read from UB (0-255)Specifies how many rows of data to read in the current transaction.Examples:
  • 0x08 = Read 8 rows
  • 0x04 = Read 4 rows (batch size)
  • 0x01 = Read 1 row
Bit position: [14:7]
ub_rd_addr_in
2-bit
default:"0"
Unified Buffer read address pointerSelects the starting address in UB for the read transaction.
The actual implementation uses 2 bits [16:15], providing 4 possible addresses. The README documentation shows 8 bits [22:15], which is a discrepancy with the hardware.
Bit position: [16:15]
ub_ptr_sel
3-bit
default:"0"
Unified Buffer pointer select - routes UB read data to different modules
ValueDestination
000Systolic array (left input)
001Systolic array (top input/weights)
010VPU bias module
011VPU loss module
100VPU activation derivative module
101VPU gradient descent (bias)
110VPU gradient descent (weights)
Example: 3'b001 = route read pointer to weight inputs of systolic arrayBit position: [19:17]

Host write data

The TPU provides two write channels for loading data into the Unified Buffer:
ub_wr_host_data_in_1
16-bit fixed-point
default:"0"
First host write data wordFixed-point format: Q8.8 (8 integer bits, 8 fractional bits)Example: 0xABCD writes the value represented by this fixed-point encodingBit position: [35:20]
ub_wr_host_data_in_2
16-bit fixed-point
default:"0"
Second host write data wordFixed-point format: Q8.8 (8 integer bits, 8 fractional bits)Enables writing two values per instruction cycle for faster data loading.Example: 0x1234Bit position: [51:36]

Vector Processing Unit control

vpu_data_pathway
4-bit
default:"0"
VPU pipeline configuration - selects which modules are active
ValueConfigurationUse case
0000BypassGradient calculation
0001Activation derivative onlyBackpropagation
1100Bias + ActivationForward pass layer 1
1111Bias + Activation + LossForward pass final layer
See VPU data pathways for complete routing details.Bit position: [55:52]
inv_batch_size_times_two_in
16-bit fixed-point
default:"0"
Precomputed scaling factor for MSE loss backpropagationFixed-point format: Q8.8Calculation: 2 / batch_sizeExamples:
  • Batch size 4: 0x0080 (2/4 = 0.5 in Q8.8)
  • Batch size 32: 0x0010 (2/32 = 0.0625 in Q8.8)
Bit position: [71:56]
vpu_leak_factor_in
16-bit fixed-point
default:"0"
Leak factor for Leaky ReLU activation functionFixed-point format: Q8.8Common values:
  • 0x0080 = 0.5 (typical for Leaky ReLU)
  • 0x0019 = 0.1 (common alternative)
  • 0x0000 = 0.0 (standard ReLU)
Example: 0x00A0 = 0.625 (160/256 in Q8.8)Bit position: [87:72]

Signal timing

Control signals follow these timing conventions:
  • Start signals (ub_rd_start_in): Assert for one cycle to initiate operation
  • Valid signals (ub_wr_host_valid_in_*): Hold high while data is valid
  • Mode signals (ub_rd_transpose, sys_switch_in): Set before starting operation
  • Data signals: Must be stable when corresponding valid signal is high
In the test sequences, start signals are typically asserted for one cycle, then cleared while the operation completes.