Mesh Systems and Vertex Packing
This document provides a complete, replicable guide to Pyrite’s mesh architecture, including data structures, vertex packing, the greedy meshing algorithm, and ambient occlusion (AO) calculation.
Overview
Pyrite renders millions of voxels using a multi-stage mesh pipeline:
Greedy Meshing: CPU-side algorithm groups adjacent coplanar faces into large rectangular polygons.
Vertex Packing: Vertex attributes compressed into 32-bit integers to minimize GPU memory.
Lighting: Per-vertex smoothed light values sampled from adjacent blocks.
AO Calculation: Corner darkness determined by surrounding block density.
GPU Upload: Main thread creates VAO/VBO objects from packed data.
Rendering: Draw calls per mesh (opaque + transparent passes).
Mesh Classes Hierarchy
BaseMesh (Core Abstract Class)
Base class for all mesh types. Defines the interface:
class BaseMesh:
__init__(self, ctx, program):
ctx: ModernGL context
program: Shader program to bind during render
self.vao: Vertex Array Object (initially None)
self.vbo: Vertex Buffer Object (initially None)
self.vertex_count: int
self.index_count: int
self.render_mode: GLenum (GL_TRIANGLES default)
render():
# Bind program, VAO, draw (implementation varies)
render_instanced():
# Render multiple instances
destroy():
# Release GPU resources
ChunkMesh (Per-Chunk Geometry)
Represents a single 48x48x48 chunk. Created during chunk building.
class ChunkMesh(BaseMesh):
__init__(self, chunk_voxels, chunk_lightmap, chunk_pos, ctx, program, world_data):
# Greedy mesh the chunk and store vertex data
self.chunk_pos: (int, int, int) # Chunk coordinates in world
self.vertex_data: np.ndarray (uint32, packed vertices)
self.light_data: np.ndarray (uint32, packed light)
self.opaque_count, self.water_count: Face counts
render():
# Render opaque faces, then water faces with separate passes
CloudMesh, CubeMesh, ItemMesh, ObjMesh (Specialized)
CloudMesh: Fixed 2D procedural clouds (sky)
CubeMesh: Renders static cubes (UI, debugging)
ItemMesh: Item entities dropped in world
ObjMesh: Wavefront .obj models (trees, items)
Vertex Packing: 32-Bit Format
To minimize GPU bandwidth and memory, each vertex attribute is bit-packed into a single 32-bit unsigned integer.
Packed Vertex Layout:
Bits 31-26 (6 bits): X coordinate (0-47)
Bits 25-20 (6 bits): Y coordinate (0-47)
Bits 19-14 (6 bits): Z coordinate (0-47)
Bits 13-6 (8 bits): Voxel ID (0-255)
Bits 5-3 (3 bits): Face ID (0-5, one of 6 faces)
Bits 2-1 (2 bits): AO ID (0-3, ambient occlusion level)
Bit 0 (1 bit): Flip ID (0 or 1, diagonal flip flag)
Total: 32 bits = 4 bytes per vertex (vs. 16 bytes for traditional (x, y, z, id, face, ao, flip, light))
Packing Formula:
packed_data = (x & 0x3F) << 26 \
| (y & 0x3F) << 20 \
| (z & 0x3F) << 14 \
| (voxel_id & 0xFF) << 6 \
| (face_id & 0x7) << 3 \
| (ao_id & 0x3) << 1 \
| (flip_id & 0x1)
Unpacking (in Vertex Shader):
void unpack(uint packed_data) {
x = int((packed_data >> 26) & 0x3F);
y = int((packed_data >> 20) & 0x3F);
z = int((packed_data >> 14) & 0x3F);
voxel_id = int((packed_data >> 6) & 0xFF);
face_id = int((packed_data >> 3) & 0x7);
ao_id = int((packed_data >> 1) & 0x3);
flip_id = int(packed_data & 0x1);
}
Light Data (Separate uint32):
Bits 7-4 (4 bits): Sunlight (0-15)
Bits 3-0 (4 bits): Blocklight (0-15)
Greedy Meshing Algorithm
Greedy meshing reduces face count by grouping coplanar, identical-ID faces into rectangles. Executed on CPU; results are packed and uploaded to GPU.
High-Level Steps:
For each of 3 orthogonal planes (XY, XZ, YZ):
Iterate through all slices perpendicular to that plane
Build 2D mask of solid vs. transparent voxels
For each solid voxel with exposed face:
Calculate AO and light for all 4 corners
Find greedy horizontal rectangle width
Find greedy vertical rectangle height
Emit quad vertices
Mark processed faces to avoid double-processing
Separate opaque and water faces into independent buffers
Detailed Algorithm: X-Plane Scanning
Processing YZ-plane slices (X varying):
for x_slice in 0 to CHUNK_SIZE-1:
# Build 2D mask of YZ values (which are solid and exposed on +X face)
mask = np.zeros((CHUNK_SIZE, CHUNK_SIZE), dtype=bool)
for y in 0 to CHUNK_SIZE-1:
for z in 0 to CHUNK_SIZE-1:
voxel_id = chunk_voxels[x_slice, y, z]
# Check if solid and has exposed +X face
if is_solid(voxel_id):
if x_slice == CHUNK_SIZE-1 or not is_solid(chunk_voxels[x_slice+1, y, z]):
mask[y, z] = True
# Greedy rectangle extraction from mask
for y in 0 to CHUNK_SIZE-1:
for z in 0 to CHUNK_SIZE-1:
if not mask[y, z]:
continue
# Find greedy width (extend along z-axis)
width = 1
while z + width < CHUNK_SIZE and mask[y, z + width]:
width += 1
# Find greedy height (extend along y-axis)
height = 1
valid = True
while y + height < CHUNK_SIZE and valid:
for z_check in z to z + width - 1:
if not mask[y + height, z_check]:
valid = False
break
if valid:
height += 1
# Mark processed to avoid overlap
for dy in 0 to height-1:
for dz in 0 to width-1:
mask[y + dy, z + dz] = False
# Get 4 corner light/AO values
l0 = get_vertex_light(x_slice, y, z)
l1 = get_vertex_light(x_slice, y+height, z)
l2 = get_vertex_light(x_slice, y+height, z+width)
l3 = get_vertex_light(x_slice, y, z+width)
ao0 = get_ao((x_slice, y, z))
ao1 = get_ao((x_slice, y+height, z))
ao2 = get_ao((x_slice, y+height, z+width))
ao3 = get_ao((x_slice, y, z+width))
# Flip detection (see section below)
flip_id = should_flip_diagonal(l0, l1, l2, l3, ao0, ao1, ao2, ao3)
# Emit 2 triangles (6 indices)
emit_quad(x_slice, y, z, width, height, flip_id, l0, l1, l2, l3)
Y and Z Plane Scanning work similarly, iterating through XZ and XY slices respectively.
Performance Note: This is a hot loop executed once per chunk load. Implemented in Numba with @njit(cache=True, nogil=True) for 500x+ speedup.
Vertex Light Smoothing
Light values are interpolated to vertices for smooth shading. Each vertex is shared by up to 8 blocks; we sample light from the 4 (on a plane) or 8 blocks surrounding that vertex.
For X-Plane Face (perpendicular normal = +X):
Four corners of the quad correspond to YZ positions. For each corner, sample from 4 blocks:
def get_vertex_light(x, y, z, plane='X'):
# plane='X' means YZ quad; sample from 4 blocks around corner
if plane == 'X':
# Corner at (x, y, z) in YZ space samples:
l1 = get_light(x, y, z) # Lower-left
l2 = get_light(x, y+1, z) # Upper-left
l3 = get_light(x, y+1, z+1) # Upper-right
l4 = get_light(x, y, z+1) # Lower-right
elif plane == 'Y':
# Similar for XZ plane
l1 = get_light(x, y, z)
l2 = get_light(x+1, y, z)
l3 = get_light(x+1, y, z+1)
l4 = get_light(x, y, z+1)
# ... etc for Z plane
# Average light (simple mean, or weighted by AO)
avg_sun = (l1 >> 4 + l2 >> 4 + l3 >> 4 + l4 >> 4) / 4
avg_block = ((l1 & 15) + (l2 & 15) + (l3 & 15) + (l4 & 15)) / 4
return (avg_sun << 4) | avg_block
Ambient Occlusion (AO) Calculation
AO darkens corners where multiple solid blocks converge, simulating soft shadows.
Corner Occlusion (for X-plane, Y-Z corner):
For each of the 4 corners of a quad, check 2x2 adjacent blocks:
def get_ao(corner_y, corner_z, plane='X'):
# plane='X': Check blocks in YZ plane around corner
# Top-left, Top-right, Bottom-left, Bottom-right (relative to corner)
ao_count = 0
if not is_transparent(voxel_at(corner_y-1, corner_z-1)):
ao_count += 1
if not is_transparent(voxel_at(corner_y, corner_z-1)):
ao_count += 1
if not is_transparent(voxel_at(corner_y-1, corner_z)):
ao_count += 1
if not is_transparent(voxel_at(corner_y, corner_z)):
ao_count += 1
# ao_count ranges 0-4, but we store only 0-3
# 0 = bright, 1 = slightly dark, 2 = moderately dark, 3 = very dark
return min(ao_count, 3)
Transparency Check:
Transparent blocks (AIR, WATER, GLASS, LEAVES) do not cast AO shadows:
def is_transparent(voxel_id):
return voxel_id in [AIR, WATER, GLASS, LEAVES]
GPU Application (in Vertex Shader):
const float ao_values[4] = float[4](0.1, 0.25, 0.5, 1.0);
// Unpack ao_id (2 bits)
int ao_id = int((packed_data >> 1) & 0x3);
// Apply to shading
shading = base_light * ao_values[ao_id];
Flip Detection (Diagonal Flip for Lighting)
When lighting is uneven across a quad, flipping the diagonal can improve visual appearance. This is determined by comparing lighting sums across the two diagonals.
Algorithm:
def should_flip_diagonal(l0, l1, l2, l3, ao0, ao1, ao2, ao3):
# l0, l1, l2, l3 = light at 4 corners (packed uint32)
# ao0, ao1, ao2, ao3 = AO at 4 corners
# Extract sun and block light
def extract_light(l):
return (l >> 4) + (l & 15) # sun + block (simplified)
# Diagonal 1: (0,2) and Diagonal 2: (1,3)
diag1_brightness = extract_light(l0) + extract_light(l2) + (ao0 + ao2)
diag2_brightness = extract_light(l1) + extract_light(l3) + (ao1 + ao3)
# Flip if diagonal 1 is brighter (optimization: break ties toward standard diagonal)
return diag1_brightness > diag2_brightness
GPU Application:
During rendering, the vertex shader uses flip_id to adjust vertex positions or UV coordinates accordingly.
Water Faces Handling
Water is rendered separately to allow transparency blending without depth-test complications.
Algorithm:
During greedy meshing, water faces (voxel_id == WATER) are marked separately
Opaque faces emitted first, water faces appended to same buffer
Render call splits: draw opaque faces first (full depth test), then draw water faces (transparency blending enabled)
Separate render call (in Shader Program):
# Opaque pass
ctx.enable(moderngl.DEPTH_TEST)
ctx.disable(moderngl.BLEND)
vao.render(mode=moderngl.TRIANGLES, vertices=opaque_count)
# Water pass
ctx.enable(moderngl.BLEND)
ctx.blend_func = (moderngl.SRC_ALPHA, moderngl.ONE_MINUS_SRC_ALPHA)
vao.render(mode=moderngl.TRIANGLES, vertices=water_count, first=opaque_count)
Mesh Building Pipeline (CPU to GPU)
Sequential Process:
Chunk Load (Background Thread):
Generate or fetch voxel data from database
Place in
load_queue
Mesh Build (Background Thread via ThreadPoolExecutor):
Pop chunk from
load_queueRun greedy meshing:
build_chunk_mesh(chunk_voxels, chunk_lightmap)Output:
vertex_data(flat uint32 array),light_data(flat uint32 array)Place in
build_queue
GPU Upload (Main Thread):
Pop from
mesh_queue(result of lighting stitching inbuild_queue)Create VAO/VBO:
ctx.vertex_array(program, vbo, vao)Store in
chunk.meshobjectIf VBO pool available, reuse; else allocate new
Rendering (Main Thread, per frame):
Frustum cull active chunks
Occlusion query invisible chunks
Bind shader, draw visible chunk meshes
VBO Pool (Memory Recycling):
vbo_pool = [] # List of unused VBOs
VBO_POOL_CAP = 150
def get_or_create_vbo(ctx, data):
if vbo_pool:
vbo = vbo_pool.pop()
vbo.write(data) # Overwrite with new data
else:
vbo = ctx.buffer(data)
return vbo
def release_vbo(vbo):
if len(vbo_pool) < VBO_POOL_CAP:
vbo_pool.append(vbo)
else:
vbo.release() # Destroy GPU memory
Data Flow Example
Raw Chunk Voxels (1D array, 110,592 elements)
↓
[Greedy Meshing: CPU]
↓
Packed Vertex Data (e.g., 10,000 vertices for a grass chunk)
↓
[Lighting Stitching: CPU]
↓
Light Data (10,000 light values)
↓
[GPU Upload: Main Thread]
↓
VBO/VAO allocated on GPU
↓
[Rendering: per frame]
↓
Vertices unpacked in Vertex Shader → Position + Attributes
↓
Fragment Shader colors pixels
Custom Mesh Variants
CloudMesh:
Fixed procedural clouds (no greedy meshing). Uses 2D Simplex noise to determine cloud density at each point. Emits simplified geometry.
ItemMesh:
Dropped items (pickaxe, stick, etc.) use simplified meshes. No greedy meshing; pre-defined vertex data per item type.
ObjMesh:
Loads Wavefront .obj files (trees, decorative structures). Parses vertices, UVs, normals. Stores as-is; no meshing.
Replication Guide
To reimplement greedy meshing from scratch:
Load voxel data into 3D array or flattened 1D array
For each of 3 planes: a. Build 2D solid/empty mask (iterate through slice) b. Extract rectangles greedily (nested loop with width/height expansion) c. For each rectangle, calculate 4 corner lights and AO values d. Pack into 32-bit integers e. Emit 2 triangles (6 indices) for the quad
Separate water faces from opaque
Upload to GPU as VBO
Render with phased passes (opaque → transparent)
Pseudocode:
def build_mesh(chunk_voxels):
vertex_data = []
light_data = []
# Process each plane
for plane in ['X', 'Y', 'Z']:
for slice_idx in range(CHUNK_SIZE):
mask = build_mask(chunk_voxels, plane, slice_idx)
processed = set()
for start_y, start_z in iterate_mask(mask):
if (start_y, start_z) in processed:
continue
width, height = greedy_expand(mask, start_y, start_z, processed)
corners_light = sample_4_corners(start_y, start_z)
corners_ao = sample_4_corners_ao(start_y, start_z)
flip = compute_flip(corners_light, corners_ao)
packed = pack_vertices(slice_idx, start_y, start_z, width, height, flip)
vertex_data.extend(packed)
return np.array(vertex_data, dtype=np.uint32), np.array(light_data, dtype=np.uint32)