Build Week

OpenCode Skill

Compare Billing Logic

An AI skill that compares SQL billing logic between the Variable Billing and Lighthouse repos — at the metric level — and explains why the numbers may differ.

50+

Products Supported

13/13

Quality Score

Output Modes

Workflow Phases

~45m

Development Time

The Problem

Why does Lighthouse show different numbers than Variable Billing?

This is one of the most frequently asked questions that I get, as part of the DIA team. Answering it requires manually digging into two separate repos, comparing SQL logic, and explaining the differences — every single time.

🕑

Time-consuming manual process

Each comparison requires opening both repos, finding the right SQL files, reading through the logic, and mentally diffing them. This can take 30-60 minutes per product.

🔍

Finding the right files is tricky

Product names differ between repos. Lighthouse may have multiple SQL files per product serving different purposes. The variable_billing folder in Lighthouse is a red herring.

💬

Explaining to non-technical stakeholders

Stakeholders don't want to hear about JOIN conditions and WHERE clauses. They need a simple answer: "here's why the numbers are different for this customer."

🔄

Repetitive across products

The same question comes up for Workers, R2, API Shield, DNS, Stream, and dozens of other products. The investigation pattern is identical each time.

The Solution

An AI skill that does the comparison for you

Give it a product name and (optionally) a customer ID. It fetches SQL from both repos, compares at the metric level, and delivers a structured report with validation queries.

Cross-repo SQL fetching

Reads variable-usage SQL locally, fetches lighthouse-ingestion SQL via GitLab API. Handles name mismatches automatically.

Metric-level matching

Doesn't just compare filenames — scans ALL SQL files, extracts output metrics, and matches them across repos by name similarity.

Customer-scoped queries

Provide an sf_account_id and all validation queries are pre-filtered to that customer — ready to paste into BigQuery.

Two output modes

Concise mode for stakeholders (plain language, no jargon). Verbose mode for engineers (code snippets, metric maps).

Validation queries included

Generates ready-to-run BigQuery SQL that shows both systems' data side-by-side, plus a diff query highlighting mismatches.

Smart exclusions

Automatically skips the variable_billing/ folder in Lighthouse (which just copies VB output) and compares against the actual independent logic.

How It Works

5-phase workflow

Phase 1

Resolve Product

Fuzzy-match the product name to directories in both repos

Phase 2

Fetch & Inventory

Read all SQL files, extract output metrics from each

Phase 3

Compare Metrics

Match metrics across repos, compare source tables, filters, formulas

Phase 4

Verdict & Report

Classify as match / minor / significant, format per output mode

Phase 5

Validation Queries

Generate ready-to-run BQ SQL filtered to the customer

Output Modes

One skill, two audiences

Concise (default)

For Stakeholders

Non-technical — Slack-ready plain language

One-line verdict in plain English
2-4 bullet points explaining why numbers differ
Business terms, not SQL jargon
One validation query with instructions

Verbose

For Engineers

Technical — full metric map with code

All files compared with metric attribution
Metric map table: VB vs Lighthouse per column
Code snippets showing exact divergences
Three validation queries (VB, Lighthouse, diff)

Tested Against

Real-world simulations

The skill was validated against two products with very different characteristics.

Simulation 1 — Verbose Mode

R2 Storage

Significant Differences

Same source table (ida_base.r2_account_metrics_daily)
VB is daily, Lighthouse aggregates monthly
Different customer scoping filters
VB tracks IA-tier metrics; Lighthouse doesn't
Storage units differ (raw bytes vs GB-months)
Lighthouse adds projections, caps, entitlements

Simulation 2 — Concise + sf_account_id

API Shield

Significant Differences

Different source tables entirely
VB measures billable request volume
Lighthouse measures feature adoption + total requests (in separate files)
Exposed the file-matching flaw — led to v3 metric-level matching
Tested with sf_account_id = 001o000000aLh2XAAS

Live Example

Concise mode output for API Shield

Here's what the skill actually produced when asked to compare API Shield for a specific customer. This is what you'd paste into Slack.

API Shield sf_account_id = 001o000000aLh2XAAS

Numbers are expected to differ (different data sources and measurement methods)

The billing system measures billable API request volume — it counts HTTP requests on zones with an API Shield subscription, filtered to specific request types, and only for enterprise zones
Lighthouse measures two different things: API Shield feature adoption (endpoints discovered, schemas uploaded) and total request volume by status code — it counts all HTTP requests on all active zones, with no subscription or request-type filtering
The request counts in billing vs Lighthouse come from different underlying data sources that measure HTTP traffic differently, so exact matches are not expected even for the same zone
If you're comparing request numbers, expect Lighthouse to show higher counts because it includes all request types and all zone plans, while billing only counts specific filtered requests on enterprise zones

A ready-to-run validation query is also included — Paste into BigQuery to see the data side-by-side for this customer

Design Iterations

How we got here

The skill went through 3 major iterations, each driven by real-world testing and feedback.

Version 1.0

File-level comparison with 11-section report

Initial draft. Compared one SQL file from each repo. Output was an 11-section technical report covering source tables, date filtering, customer filters, metrics, aggregation, config, and more. Validated against R2 product.

+ Core workflow + variable_billing/ exclusion + Decision rules

Version 2.0

Customer-scoped queries + concise/verbose modes

Added sf_account_id support — the most common use case is a stakeholder asking about one specific customer. Introduced concise mode (3 sections) vs verbose mode (trimmed from 11 to 5 sections). Verbose was still too long in v1.

+ sf_account_id filtering + Concise / verbose modes ~ Report trimmed 11 → 5 sections

Version 3.0 — Corrections

Metric-level matching + non-technical concise output

Testing with API Shield revealed a critical flaw: the skill was matching by filename, but the relevant "requests" metric was in a different file (api_request_status_by_zone_monthly_query.sql) than the main product file. Switched to metric-level matching across ALL SQL files. Also rewrote concise mode tone guidelines for non-technical stakeholders.

! Fixed file-level matching flaw ~ Metric-level matching across all files ~ Concise mode: non-technical tone + Tone guidelines for plain language

Published

Quality score: 13/13 — Published to Skills API

Final version validated and published. Available as a local skill in the variable-usage repo and via the Skills API registry.

+ Published to Skills API + Saved locally to .opencode/skills/

Key Corrections

What we caught and fixed during development

variable_billing/ trap

Lighthouse has a queries/bigquery/variable_billing/ folder that copies VB output. Comparing against it would always show "match" — misleading. Added explicit exclusion rule.

File-level matching failure

API Shield simulation showed that the "requests" metric lived in a different file than expected. Switched from file-level to metric-level matching across all SQL files.

Too-technical concise output

First concise mode still used SQL jargon. Added explicit tone guidelines: say "billing system" not "variable-usage pipeline", "enterprise zones" not "zone_plan = ENT".

Bonus

OpenCode patted itself on the back — but it was wrong!

...but it also took feedback well!

After the first API Shield simulation, OpenCode declared success — saying the skill "handles well" the case where pipelines measure completely different things. But it missed that a different lighthouse file actually had the comparable requests metric. One nudge later, the entire matching approach was redesigned.

      This was a particularly interesting test case because it revealed something the skill handles well: when the two pipelines measure completely different things for the same product. The skill correctly identified that VB tracks billable request volume while Lighthouse tracks feature adoption metrics, and adapted the validation query to be a diagnostic "what does each pipeline see" query rather than a traditional diff.
    

⚠️ What it missed: The lighthouse repo had api_request_status_by_zone_monthly_query.sql with a requests metric that actually was comparable to VB's sum_requests — but the skill never looked at that file because it was matching by filename, not by metric.

✅ The fix: One piece of feedback later, the entire Phase 2 was redesigned — from file-level matching to metric-level matching across ALL SQL files. This became the defining feature of v3.