Mesh Systems and Vertex Packing

This document provides a complete, replicable guide to Pyrite’s mesh architecture, including data structures, vertex packing, the greedy meshing algorithm, and ambient occlusion (AO) calculation.

Overview

Pyrite renders millions of voxels using a multi-stage mesh pipeline:

  1. Greedy Meshing: CPU-side algorithm groups adjacent coplanar faces into large rectangular polygons.

  2. Vertex Packing: Vertex attributes compressed into 32-bit integers to minimize GPU memory.

  3. Lighting: Per-vertex smoothed light values sampled from adjacent blocks.

  4. AO Calculation: Corner darkness determined by surrounding block density.

  5. GPU Upload: Main thread creates VAO/VBO objects from packed data.

  6. Rendering: Draw calls per mesh (opaque + transparent passes).

Mesh Classes Hierarchy

BaseMesh (Core Abstract Class)

Base class for all mesh types. Defines the interface:

class BaseMesh:
    __init__(self, ctx, program):
        ctx: ModernGL context
        program: Shader program to bind during render
        self.vao: Vertex Array Object (initially None)
        self.vbo: Vertex Buffer Object (initially None)
        self.vertex_count: int
        self.index_count: int
        self.render_mode: GLenum (GL_TRIANGLES default)

    render():
        # Bind program, VAO, draw (implementation varies)

    render_instanced():
        # Render multiple instances

    destroy():
        # Release GPU resources

ChunkMesh (Per-Chunk Geometry)

Represents a single 48x48x48 chunk. Created during chunk building.

class ChunkMesh(BaseMesh):
    __init__(self, chunk_voxels, chunk_lightmap, chunk_pos, ctx, program, world_data):
        # Greedy mesh the chunk and store vertex data
        self.chunk_pos: (int, int, int)  # Chunk coordinates in world
        self.vertex_data: np.ndarray (uint32, packed vertices)
        self.light_data: np.ndarray (uint32, packed light)
        self.opaque_count, self.water_count: Face counts

    render():
        # Render opaque faces, then water faces with separate passes

CloudMesh, CubeMesh, ItemMesh, ObjMesh (Specialized)

  • CloudMesh: Fixed 2D procedural clouds (sky)

  • CubeMesh: Renders static cubes (UI, debugging)

  • ItemMesh: Item entities dropped in world

  • ObjMesh: Wavefront .obj models (trees, items)

Vertex Packing: 32-Bit Format

To minimize GPU bandwidth and memory, each vertex attribute is bit-packed into a single 32-bit unsigned integer.

Packed Vertex Layout:

Bits 31-26 (6 bits): X coordinate (0-47)
Bits 25-20 (6 bits): Y coordinate (0-47)
Bits 19-14 (6 bits): Z coordinate (0-47)
Bits 13-6  (8 bits): Voxel ID (0-255)
Bits 5-3   (3 bits): Face ID (0-5, one of 6 faces)
Bits 2-1   (2 bits): AO ID (0-3, ambient occlusion level)
Bit  0     (1 bit):  Flip ID (0 or 1, diagonal flip flag)

Total: 32 bits = 4 bytes per vertex (vs. 16 bytes for traditional (x, y, z, id, face, ao, flip, light))

Packing Formula:

packed_data = (x & 0x3F) << 26 \
            | (y & 0x3F) << 20 \
            | (z & 0x3F) << 14 \
            | (voxel_id & 0xFF) << 6 \
            | (face_id & 0x7) << 3 \
            | (ao_id & 0x3) << 1 \
            | (flip_id & 0x1)

Unpacking (in Vertex Shader):

void unpack(uint packed_data) {
    x = int((packed_data >> 26) & 0x3F);
    y = int((packed_data >> 20) & 0x3F);
    z = int((packed_data >> 14) & 0x3F);
    voxel_id = int((packed_data >> 6) & 0xFF);
    face_id = int((packed_data >> 3) & 0x7);
    ao_id = int((packed_data >> 1) & 0x3);
    flip_id = int(packed_data & 0x1);
}

Light Data (Separate uint32):

Bits 7-4 (4 bits): Sunlight (0-15)
Bits 3-0 (4 bits): Blocklight (0-15)

Greedy Meshing Algorithm

Greedy meshing reduces face count by grouping coplanar, identical-ID faces into rectangles. Executed on CPU; results are packed and uploaded to GPU.

High-Level Steps:

  1. For each of 3 orthogonal planes (XY, XZ, YZ):

    1. Iterate through all slices perpendicular to that plane

    2. Build 2D mask of solid vs. transparent voxels

    3. For each solid voxel with exposed face:

      • Calculate AO and light for all 4 corners

      • Find greedy horizontal rectangle width

      • Find greedy vertical rectangle height

      • Emit quad vertices

    4. Mark processed faces to avoid double-processing

  2. Separate opaque and water faces into independent buffers

Detailed Algorithm: X-Plane Scanning

Processing YZ-plane slices (X varying):

for x_slice in 0 to CHUNK_SIZE-1:
    # Build 2D mask of YZ values (which are solid and exposed on +X face)
    mask = np.zeros((CHUNK_SIZE, CHUNK_SIZE), dtype=bool)

    for y in 0 to CHUNK_SIZE-1:
        for z in 0 to CHUNK_SIZE-1:
            voxel_id = chunk_voxels[x_slice, y, z]

            # Check if solid and has exposed +X face
            if is_solid(voxel_id):
                if x_slice == CHUNK_SIZE-1 or not is_solid(chunk_voxels[x_slice+1, y, z]):
                    mask[y, z] = True

    # Greedy rectangle extraction from mask
    for y in 0 to CHUNK_SIZE-1:
        for z in 0 to CHUNK_SIZE-1:
            if not mask[y, z]:
                continue

            # Find greedy width (extend along z-axis)
            width = 1
            while z + width < CHUNK_SIZE and mask[y, z + width]:
                width += 1

            # Find greedy height (extend along y-axis)
            height = 1
            valid = True
            while y + height < CHUNK_SIZE and valid:
                for z_check in z to z + width - 1:
                    if not mask[y + height, z_check]:
                        valid = False
                        break
                if valid:
                    height += 1

            # Mark processed to avoid overlap
            for dy in 0 to height-1:
                for dz in 0 to width-1:
                    mask[y + dy, z + dz] = False

            # Get 4 corner light/AO values
            l0 = get_vertex_light(x_slice, y, z)
            l1 = get_vertex_light(x_slice, y+height, z)
            l2 = get_vertex_light(x_slice, y+height, z+width)
            l3 = get_vertex_light(x_slice, y, z+width)

            ao0 = get_ao((x_slice, y, z))
            ao1 = get_ao((x_slice, y+height, z))
            ao2 = get_ao((x_slice, y+height, z+width))
            ao3 = get_ao((x_slice, y, z+width))

            # Flip detection (see section below)
            flip_id = should_flip_diagonal(l0, l1, l2, l3, ao0, ao1, ao2, ao3)

            # Emit 2 triangles (6 indices)
            emit_quad(x_slice, y, z, width, height, flip_id, l0, l1, l2, l3)

Y and Z Plane Scanning work similarly, iterating through XZ and XY slices respectively.

Performance Note: This is a hot loop executed once per chunk load. Implemented in Numba with @njit(cache=True, nogil=True) for 500x+ speedup.

Vertex Light Smoothing

Light values are interpolated to vertices for smooth shading. Each vertex is shared by up to 8 blocks; we sample light from the 4 (on a plane) or 8 blocks surrounding that vertex.

For X-Plane Face (perpendicular normal = +X):

Four corners of the quad correspond to YZ positions. For each corner, sample from 4 blocks:

def get_vertex_light(x, y, z, plane='X'):
    # plane='X' means YZ quad; sample from 4 blocks around corner

    if plane == 'X':
        # Corner at (x, y, z) in YZ space samples:
        l1 = get_light(x, y, z)           # Lower-left
        l2 = get_light(x, y+1, z)         # Upper-left
        l3 = get_light(x, y+1, z+1)       # Upper-right
        l4 = get_light(x, y, z+1)         # Lower-right
    elif plane == 'Y':
        # Similar for XZ plane
        l1 = get_light(x, y, z)
        l2 = get_light(x+1, y, z)
        l3 = get_light(x+1, y, z+1)
        l4 = get_light(x, y, z+1)
    # ... etc for Z plane

    # Average light (simple mean, or weighted by AO)
    avg_sun = (l1 >> 4 + l2 >> 4 + l3 >> 4 + l4 >> 4) / 4
    avg_block = ((l1 & 15) + (l2 & 15) + (l3 & 15) + (l4 & 15)) / 4

    return (avg_sun << 4) | avg_block

Ambient Occlusion (AO) Calculation

AO darkens corners where multiple solid blocks converge, simulating soft shadows.

Corner Occlusion (for X-plane, Y-Z corner):

For each of the 4 corners of a quad, check 2x2 adjacent blocks:

def get_ao(corner_y, corner_z, plane='X'):
    # plane='X': Check blocks in YZ plane around corner

    # Top-left, Top-right, Bottom-left, Bottom-right (relative to corner)
    ao_count = 0

    if not is_transparent(voxel_at(corner_y-1, corner_z-1)):
        ao_count += 1
    if not is_transparent(voxel_at(corner_y, corner_z-1)):
        ao_count += 1
    if not is_transparent(voxel_at(corner_y-1, corner_z)):
        ao_count += 1
    if not is_transparent(voxel_at(corner_y, corner_z)):
        ao_count += 1

    # ao_count ranges 0-4, but we store only 0-3
    # 0 = bright, 1 = slightly dark, 2 = moderately dark, 3 = very dark
    return min(ao_count, 3)

Transparency Check:

Transparent blocks (AIR, WATER, GLASS, LEAVES) do not cast AO shadows:

def is_transparent(voxel_id):
    return voxel_id in [AIR, WATER, GLASS, LEAVES]

GPU Application (in Vertex Shader):

const float ao_values[4] = float[4](0.1, 0.25, 0.5, 1.0);

// Unpack ao_id (2 bits)
int ao_id = int((packed_data >> 1) & 0x3);

// Apply to shading
shading = base_light * ao_values[ao_id];

Flip Detection (Diagonal Flip for Lighting)

When lighting is uneven across a quad, flipping the diagonal can improve visual appearance. This is determined by comparing lighting sums across the two diagonals.

Algorithm:

def should_flip_diagonal(l0, l1, l2, l3, ao0, ao1, ao2, ao3):
    # l0, l1, l2, l3 = light at 4 corners (packed uint32)
    # ao0, ao1, ao2, ao3 = AO at 4 corners

    # Extract sun and block light
    def extract_light(l):
        return (l >> 4) + (l & 15)  # sun + block (simplified)

    # Diagonal 1: (0,2) and Diagonal 2: (1,3)
    diag1_brightness = extract_light(l0) + extract_light(l2) + (ao0 + ao2)
    diag2_brightness = extract_light(l1) + extract_light(l3) + (ao1 + ao3)

    # Flip if diagonal 1 is brighter (optimization: break ties toward standard diagonal)
    return diag1_brightness > diag2_brightness

GPU Application:

During rendering, the vertex shader uses flip_id to adjust vertex positions or UV coordinates accordingly.

Water Faces Handling

Water is rendered separately to allow transparency blending without depth-test complications.

Algorithm:

  1. During greedy meshing, water faces (voxel_id == WATER) are marked separately

  2. Opaque faces emitted first, water faces appended to same buffer

  3. Render call splits: draw opaque faces first (full depth test), then draw water faces (transparency blending enabled)

Separate render call (in Shader Program):

# Opaque pass
ctx.enable(moderngl.DEPTH_TEST)
ctx.disable(moderngl.BLEND)
vao.render(mode=moderngl.TRIANGLES, vertices=opaque_count)

# Water pass
ctx.enable(moderngl.BLEND)
ctx.blend_func = (moderngl.SRC_ALPHA, moderngl.ONE_MINUS_SRC_ALPHA)
vao.render(mode=moderngl.TRIANGLES, vertices=water_count, first=opaque_count)

Mesh Building Pipeline (CPU to GPU)

Sequential Process:

  1. Chunk Load (Background Thread):

    • Generate or fetch voxel data from database

    • Place in load_queue

  2. Mesh Build (Background Thread via ThreadPoolExecutor):

    • Pop chunk from load_queue

    • Run greedy meshing: build_chunk_mesh(chunk_voxels, chunk_lightmap)

    • Output: vertex_data (flat uint32 array), light_data (flat uint32 array)

    • Place in build_queue

  3. GPU Upload (Main Thread):

    • Pop from mesh_queue (result of lighting stitching in build_queue)

    • Create VAO/VBO: ctx.vertex_array(program, vbo, vao)

    • Store in chunk.mesh object

    • If VBO pool available, reuse; else allocate new

  4. Rendering (Main Thread, per frame):

    • Frustum cull active chunks

    • Occlusion query invisible chunks

    • Bind shader, draw visible chunk meshes

VBO Pool (Memory Recycling):

vbo_pool = []  # List of unused VBOs
VBO_POOL_CAP = 150

def get_or_create_vbo(ctx, data):
    if vbo_pool:
        vbo = vbo_pool.pop()
        vbo.write(data)  # Overwrite with new data
    else:
        vbo = ctx.buffer(data)
    return vbo

def release_vbo(vbo):
    if len(vbo_pool) < VBO_POOL_CAP:
        vbo_pool.append(vbo)
    else:
        vbo.release()  # Destroy GPU memory

Data Flow Example

Raw Chunk Voxels (1D array, 110,592 elements)
         ↓
[Greedy Meshing: CPU]
         ↓
Packed Vertex Data (e.g., 10,000 vertices for a grass chunk)
         ↓
[Lighting Stitching: CPU]
         ↓
Light Data (10,000 light values)
         ↓
[GPU Upload: Main Thread]
         ↓
VBO/VAO allocated on GPU
         ↓
[Rendering: per frame]
         ↓
Vertices unpacked in Vertex Shader → Position + Attributes
         ↓
Fragment Shader colors pixels

Custom Mesh Variants

CloudMesh:

Fixed procedural clouds (no greedy meshing). Uses 2D Simplex noise to determine cloud density at each point. Emits simplified geometry.

ItemMesh:

Dropped items (pickaxe, stick, etc.) use simplified meshes. No greedy meshing; pre-defined vertex data per item type.

ObjMesh:

Loads Wavefront .obj files (trees, decorative structures). Parses vertices, UVs, normals. Stores as-is; no meshing.

Replication Guide

To reimplement greedy meshing from scratch:

  1. Load voxel data into 3D array or flattened 1D array

  2. For each of 3 planes: a. Build 2D solid/empty mask (iterate through slice) b. Extract rectangles greedily (nested loop with width/height expansion) c. For each rectangle, calculate 4 corner lights and AO values d. Pack into 32-bit integers e. Emit 2 triangles (6 indices) for the quad

  3. Separate water faces from opaque

  4. Upload to GPU as VBO

  5. Render with phased passes (opaque → transparent)

Pseudocode:

def build_mesh(chunk_voxels):
    vertex_data = []
    light_data = []

    # Process each plane
    for plane in ['X', 'Y', 'Z']:
        for slice_idx in range(CHUNK_SIZE):
            mask = build_mask(chunk_voxels, plane, slice_idx)

            processed = set()
            for start_y, start_z in iterate_mask(mask):
                if (start_y, start_z) in processed:
                    continue

                width, height = greedy_expand(mask, start_y, start_z, processed)

                corners_light = sample_4_corners(start_y, start_z)
                corners_ao = sample_4_corners_ao(start_y, start_z)
                flip = compute_flip(corners_light, corners_ao)

                packed = pack_vertices(slice_idx, start_y, start_z, width, height, flip)
                vertex_data.extend(packed)

    return np.array(vertex_data, dtype=np.uint32), np.array(light_data, dtype=np.uint32)