.. _meshes:

Mesh Systems and Vertex Packing
================================

This document provides a complete, replicable guide to Pyrite's mesh architecture, including data structures, vertex packing, the greedy meshing algorithm, and ambient occlusion (AO) calculation.

Overview
--------

Pyrite renders millions of voxels using a multi-stage mesh pipeline:

1. **Greedy Meshing:** CPU-side algorithm groups adjacent coplanar faces into large rectangular polygons.
2. **Vertex Packing:** Vertex attributes compressed into 32-bit integers to minimize GPU memory.
3. **Lighting:** Per-vertex smoothed light values sampled from adjacent blocks.
4. **AO Calculation:** Corner darkness determined by surrounding block density.
5. **GPU Upload:** Main thread creates VAO/VBO objects from packed data.
6. **Rendering:** Draw calls per mesh (opaque + transparent passes).

Mesh Classes Hierarchy
----------------------

**BaseMesh** (Core Abstract Class)

Base class for all mesh types. Defines the interface:

.. code-block:: text

    class BaseMesh:
        __init__(self, ctx, program):
            ctx: ModernGL context
            program: Shader program to bind during render
            self.vao: Vertex Array Object (initially None)
            self.vbo: Vertex Buffer Object (initially None)
            self.vertex_count: int
            self.index_count: int
            self.render_mode: GLenum (GL_TRIANGLES default)
        
        render():
            # Bind program, VAO, draw (implementation varies)
        
        render_instanced():
            # Render multiple instances
        
        destroy():
            # Release GPU resources

**ChunkMesh** (Per-Chunk Geometry)

Represents a single 48x48x48 chunk. Created during chunk building.

.. code-block:: text

    class ChunkMesh(BaseMesh):
        __init__(self, chunk_voxels, chunk_lightmap, chunk_pos, ctx, program, world_data):
            # Greedy mesh the chunk and store vertex data
            self.chunk_pos: (int, int, int)  # Chunk coordinates in world
            self.vertex_data: np.ndarray (uint32, packed vertices)
            self.light_data: np.ndarray (uint32, packed light)
            self.opaque_count, self.water_count: Face counts
        
        render():
            # Render opaque faces, then water faces with separate passes

**CloudMesh, CubeMesh, ItemMesh, ObjMesh** (Specialized)

- **CloudMesh:** Fixed 2D procedural clouds (sky)
- **CubeMesh:** Renders static cubes (UI, debugging)
- **ItemMesh:** Item entities dropped in world
- **ObjMesh:** Wavefront .obj models (trees, items)

Vertex Packing: 32-Bit Format
------------------------------

To minimize GPU bandwidth and memory, each vertex attribute is bit-packed into a single 32-bit unsigned integer.

**Packed Vertex Layout:**

.. code-block:: text

    Bits 31-26 (6 bits): X coordinate (0-47)
    Bits 25-20 (6 bits): Y coordinate (0-47)
    Bits 19-14 (6 bits): Z coordinate (0-47)
    Bits 13-6  (8 bits): Voxel ID (0-255)
    Bits 5-3   (3 bits): Face ID (0-5, one of 6 faces)
    Bits 2-1   (2 bits): AO ID (0-3, ambient occlusion level)
    Bit  0     (1 bit):  Flip ID (0 or 1, diagonal flip flag)

**Total:** 32 bits = 4 bytes per vertex (vs. 16 bytes for traditional (x, y, z, id, face, ao, flip, light))

**Packing Formula:**

.. code-block:: python

    packed_data = (x & 0x3F) << 26 \
                | (y & 0x3F) << 20 \
                | (z & 0x3F) << 14 \
                | (voxel_id & 0xFF) << 6 \
                | (face_id & 0x7) << 3 \
                | (ao_id & 0x3) << 1 \
                | (flip_id & 0x1)

**Unpacking (in Vertex Shader):**

.. code-block:: glsl

    void unpack(uint packed_data) {
        x = int((packed_data >> 26) & 0x3F);
        y = int((packed_data >> 20) & 0x3F);
        z = int((packed_data >> 14) & 0x3F);
        voxel_id = int((packed_data >> 6) & 0xFF);
        face_id = int((packed_data >> 3) & 0x7);
        ao_id = int((packed_data >> 1) & 0x3);
        flip_id = int(packed_data & 0x1);
    }

**Light Data (Separate uint32):**

.. code-block:: text

    Bits 7-4 (4 bits): Sunlight (0-15)
    Bits 3-0 (4 bits): Blocklight (0-15)

Greedy Meshing Algorithm
------------------------

Greedy meshing reduces face count by grouping coplanar, identical-ID faces into rectangles. Executed on CPU; results are packed and uploaded to GPU.

**High-Level Steps:**

1. For each of 3 orthogonal planes (XY, XZ, YZ):

   a. Iterate through all slices perpendicular to that plane
   b. Build 2D mask of solid vs. transparent voxels
   c. For each solid voxel with exposed face:
      
      - Calculate AO and light for all 4 corners
      - Find greedy horizontal rectangle width
      - Find greedy vertical rectangle height
      - Emit quad vertices
      
   d. Mark processed faces to avoid double-processing

2. Separate opaque and water faces into independent buffers

**Detailed Algorithm: X-Plane Scanning**

Processing YZ-plane slices (X varying):

.. code-block:: python

    for x_slice in 0 to CHUNK_SIZE-1:
        # Build 2D mask of YZ values (which are solid and exposed on +X face)
        mask = np.zeros((CHUNK_SIZE, CHUNK_SIZE), dtype=bool)
        
        for y in 0 to CHUNK_SIZE-1:
            for z in 0 to CHUNK_SIZE-1:
                voxel_id = chunk_voxels[x_slice, y, z]
                
                # Check if solid and has exposed +X face
                if is_solid(voxel_id):
                    if x_slice == CHUNK_SIZE-1 or not is_solid(chunk_voxels[x_slice+1, y, z]):
                        mask[y, z] = True
        
        # Greedy rectangle extraction from mask
        for y in 0 to CHUNK_SIZE-1:
            for z in 0 to CHUNK_SIZE-1:
                if not mask[y, z]:
                    continue
                
                # Find greedy width (extend along z-axis)
                width = 1
                while z + width < CHUNK_SIZE and mask[y, z + width]:
                    width += 1
                
                # Find greedy height (extend along y-axis)
                height = 1
                valid = True
                while y + height < CHUNK_SIZE and valid:
                    for z_check in z to z + width - 1:
                        if not mask[y + height, z_check]:
                            valid = False
                            break
                    if valid:
                        height += 1
                
                # Mark processed to avoid overlap
                for dy in 0 to height-1:
                    for dz in 0 to width-1:
                        mask[y + dy, z + dz] = False
                
                # Get 4 corner light/AO values
                l0 = get_vertex_light(x_slice, y, z)
                l1 = get_vertex_light(x_slice, y+height, z)
                l2 = get_vertex_light(x_slice, y+height, z+width)
                l3 = get_vertex_light(x_slice, y, z+width)
                
                ao0 = get_ao((x_slice, y, z))
                ao1 = get_ao((x_slice, y+height, z))
                ao2 = get_ao((x_slice, y+height, z+width))
                ao3 = get_ao((x_slice, y, z+width))
                
                # Flip detection (see section below)
                flip_id = should_flip_diagonal(l0, l1, l2, l3, ao0, ao1, ao2, ao3)
                
                # Emit 2 triangles (6 indices)
                emit_quad(x_slice, y, z, width, height, flip_id, l0, l1, l2, l3)

**Y and Z Plane Scanning** work similarly, iterating through XZ and XY slices respectively.

**Performance Note:** This is a hot loop executed once per chunk load. Implemented in Numba with ``@njit(cache=True, nogil=True)`` for 500x+ speedup.

Vertex Light Smoothing
----------------------

Light values are interpolated to vertices for smooth shading. Each vertex is shared by up to 8 blocks; we sample light from the 4 (on a plane) or 8 blocks surrounding that vertex.

**For X-Plane Face (perpendicular normal = +X):**

Four corners of the quad correspond to YZ positions. For each corner, sample from 4 blocks:

.. code-block:: python

    def get_vertex_light(x, y, z, plane='X'):
        # plane='X' means YZ quad; sample from 4 blocks around corner
        
        if plane == 'X':
            # Corner at (x, y, z) in YZ space samples:
            l1 = get_light(x, y, z)           # Lower-left
            l2 = get_light(x, y+1, z)         # Upper-left
            l3 = get_light(x, y+1, z+1)       # Upper-right
            l4 = get_light(x, y, z+1)         # Lower-right
        elif plane == 'Y':
            # Similar for XZ plane
            l1 = get_light(x, y, z)
            l2 = get_light(x+1, y, z)
            l3 = get_light(x+1, y, z+1)
            l4 = get_light(x, y, z+1)
        # ... etc for Z plane
        
        # Average light (simple mean, or weighted by AO)
        avg_sun = (l1 >> 4 + l2 >> 4 + l3 >> 4 + l4 >> 4) / 4
        avg_block = ((l1 & 15) + (l2 & 15) + (l3 & 15) + (l4 & 15)) / 4
        
        return (avg_sun << 4) | avg_block

Ambient Occlusion (AO) Calculation
-----------------------------------

AO darkens corners where multiple solid blocks converge, simulating soft shadows.

**Corner Occlusion (for X-plane, Y-Z corner):**

For each of the 4 corners of a quad, check 2x2 adjacent blocks:

.. code-block:: python

    def get_ao(corner_y, corner_z, plane='X'):
        # plane='X': Check blocks in YZ plane around corner
        
        # Top-left, Top-right, Bottom-left, Bottom-right (relative to corner)
        ao_count = 0
        
        if not is_transparent(voxel_at(corner_y-1, corner_z-1)):
            ao_count += 1
        if not is_transparent(voxel_at(corner_y, corner_z-1)):
            ao_count += 1
        if not is_transparent(voxel_at(corner_y-1, corner_z)):
            ao_count += 1
        if not is_transparent(voxel_at(corner_y, corner_z)):
            ao_count += 1
        
        # ao_count ranges 0-4, but we store only 0-3
        # 0 = bright, 1 = slightly dark, 2 = moderately dark, 3 = very dark
        return min(ao_count, 3)

**Transparency Check:**

Transparent blocks (AIR, WATER, GLASS, LEAVES) do not cast AO shadows:

.. code-block:: python

    def is_transparent(voxel_id):
        return voxel_id in [AIR, WATER, GLASS, LEAVES]

**GPU Application (in Vertex Shader):**

.. code-block:: glsl

    const float ao_values[4] = float[4](0.1, 0.25, 0.5, 1.0);
    
    // Unpack ao_id (2 bits)
    int ao_id = int((packed_data >> 1) & 0x3);
    
    // Apply to shading
    shading = base_light * ao_values[ao_id];

Flip Detection (Diagonal Flip for Lighting)
--------------------------------------------

When lighting is uneven across a quad, flipping the diagonal can improve visual appearance. This is determined by comparing lighting sums across the two diagonals.

**Algorithm:**

.. code-block:: python

    def should_flip_diagonal(l0, l1, l2, l3, ao0, ao1, ao2, ao3):
        # l0, l1, l2, l3 = light at 4 corners (packed uint32)
        # ao0, ao1, ao2, ao3 = AO at 4 corners
        
        # Extract sun and block light
        def extract_light(l):
            return (l >> 4) + (l & 15)  # sun + block (simplified)
        
        # Diagonal 1: (0,2) and Diagonal 2: (1,3)
        diag1_brightness = extract_light(l0) + extract_light(l2) + (ao0 + ao2)
        diag2_brightness = extract_light(l1) + extract_light(l3) + (ao1 + ao3)
        
        # Flip if diagonal 1 is brighter (optimization: break ties toward standard diagonal)
        return diag1_brightness > diag2_brightness

**GPU Application:**

During rendering, the vertex shader uses ``flip_id`` to adjust vertex positions or UV coordinates accordingly.

Water Faces Handling
--------------------

Water is rendered separately to allow transparency blending without depth-test complications.

**Algorithm:**

1. During greedy meshing, water faces (voxel_id == WATER) are marked separately
2. Opaque faces emitted first, water faces appended to same buffer
3. Render call splits: draw opaque faces first (full depth test), then draw water faces (transparency blending enabled)

**Separate render call (in Shader Program):**

.. code-block:: python

    # Opaque pass
    ctx.enable(moderngl.DEPTH_TEST)
    ctx.disable(moderngl.BLEND)
    vao.render(mode=moderngl.TRIANGLES, vertices=opaque_count)

    # Water pass
    ctx.enable(moderngl.BLEND)
    ctx.blend_func = (moderngl.SRC_ALPHA, moderngl.ONE_MINUS_SRC_ALPHA)
    vao.render(mode=moderngl.TRIANGLES, vertices=water_count, first=opaque_count)

Mesh Building Pipeline (CPU to GPU)
-----------------------------------

**Sequential Process:**

1. **Chunk Load (Background Thread):**

   - Generate or fetch voxel data from database
   - Place in ``load_queue``

2. **Mesh Build (Background Thread via ThreadPoolExecutor):**

   - Pop chunk from ``load_queue``
   - Run greedy meshing: ``build_chunk_mesh(chunk_voxels, chunk_lightmap)``
   - Output: ``vertex_data`` (flat uint32 array), ``light_data`` (flat uint32 array)
   - Place in ``build_queue``

3. **GPU Upload (Main Thread):**

   - Pop from ``mesh_queue`` (result of lighting stitching in ``build_queue``)
   - Create VAO/VBO: ``ctx.vertex_array(program, vbo, vao)``
   - Store in ``chunk.mesh`` object
   - If VBO pool available, reuse; else allocate new

4. **Rendering (Main Thread, per frame):**

   - Frustum cull active chunks
   - Occlusion query invisible chunks
   - Bind shader, draw visible chunk meshes

**VBO Pool (Memory Recycling):**

.. code-block:: python

    vbo_pool = []  # List of unused VBOs
    VBO_POOL_CAP = 150
    
    def get_or_create_vbo(ctx, data):
        if vbo_pool:
            vbo = vbo_pool.pop()
            vbo.write(data)  # Overwrite with new data
        else:
            vbo = ctx.buffer(data)
        return vbo
    
    def release_vbo(vbo):
        if len(vbo_pool) < VBO_POOL_CAP:
            vbo_pool.append(vbo)
        else:
            vbo.release()  # Destroy GPU memory

Data Flow Example
-----------------

.. code-block:: text

    Raw Chunk Voxels (1D array, 110,592 elements)
             ↓
    [Greedy Meshing: CPU]
             ↓
    Packed Vertex Data (e.g., 10,000 vertices for a grass chunk)
             ↓
    [Lighting Stitching: CPU]
             ↓
    Light Data (10,000 light values)
             ↓
    [GPU Upload: Main Thread]
             ↓
    VBO/VAO allocated on GPU
             ↓
    [Rendering: per frame]
             ↓
    Vertices unpacked in Vertex Shader → Position + Attributes
             ↓
    Fragment Shader colors pixels

Custom Mesh Variants
--------------------

**CloudMesh:**

Fixed procedural clouds (no greedy meshing). Uses 2D Simplex noise to determine cloud density at each point. Emits simplified geometry.

**ItemMesh:**

Dropped items (pickaxe, stick, etc.) use simplified meshes. No greedy meshing; pre-defined vertex data per item type.

**ObjMesh:**

Loads Wavefront .obj files (trees, decorative structures). Parses vertices, UVs, normals. Stores as-is; no meshing.

Replication Guide
-----------------

**To reimplement greedy meshing from scratch:**

1. Load voxel data into 3D array or flattened 1D array
2. For each of 3 planes:
   a. Build 2D solid/empty mask (iterate through slice)
   b. Extract rectangles greedily (nested loop with width/height expansion)
   c. For each rectangle, calculate 4 corner lights and AO values
   d. Pack into 32-bit integers
   e. Emit 2 triangles (6 indices) for the quad
3. Separate water faces from opaque
4. Upload to GPU as VBO
5. Render with phased passes (opaque → transparent)

**Pseudocode:**

.. code-block:: python

    def build_mesh(chunk_voxels):
        vertex_data = []
        light_data = []
        
        # Process each plane
        for plane in ['X', 'Y', 'Z']:
            for slice_idx in range(CHUNK_SIZE):
                mask = build_mask(chunk_voxels, plane, slice_idx)
                
                processed = set()
                for start_y, start_z in iterate_mask(mask):
                    if (start_y, start_z) in processed:
                        continue
                    
                    width, height = greedy_expand(mask, start_y, start_z, processed)
                    
                    corners_light = sample_4_corners(start_y, start_z)
                    corners_ao = sample_4_corners_ao(start_y, start_z)
                    flip = compute_flip(corners_light, corners_ao)
                    
                    packed = pack_vertices(slice_idx, start_y, start_z, width, height, flip)
                    vertex_data.extend(packed)
        
        return np.array(vertex_data, dtype=np.uint32), np.array(light_data, dtype=np.uint32)