OpenCode Skill

Compare Billing Logic

An AI skill that compares SQL billing logic between the Variable Billing and Lighthouse repos — at the metric level — and explains why the numbers may differ.

50+
Products Supported
13/13
Quality Score
2
Output Modes
5
Workflow Phases
~45m
Development Time

Why does Lighthouse show different numbers than Variable Billing?

This is one of the most frequently asked questions that I get, as part of the DIA team. Answering it requires manually digging into two separate repos, comparing SQL logic, and explaining the differences — every single time.

🕑

Time-consuming manual process

Each comparison requires opening both repos, finding the right SQL files, reading through the logic, and mentally diffing them. This can take 30-60 minutes per product.

🔍

Finding the right files is tricky

Product names differ between repos. Lighthouse may have multiple SQL files per product serving different purposes. The variable_billing folder in Lighthouse is a red herring.

💬

Explaining to non-technical stakeholders

Stakeholders don't want to hear about JOIN conditions and WHERE clauses. They need a simple answer: "here's why the numbers are different for this customer."

🔄

Repetitive across products

The same question comes up for Workers, R2, API Shield, DNS, Stream, and dozens of other products. The investigation pattern is identical each time.

An AI skill that does the comparison for you

Give it a product name and (optionally) a customer ID. It fetches SQL from both repos, compares at the metric level, and delivers a structured report with validation queries.

1

Cross-repo SQL fetching

Reads variable-usage SQL locally, fetches lighthouse-ingestion SQL via GitLab API. Handles name mismatches automatically.

2

Metric-level matching

Doesn't just compare filenames — scans ALL SQL files, extracts output metrics, and matches them across repos by name similarity.

3

Customer-scoped queries

Provide an sf_account_id and all validation queries are pre-filtered to that customer — ready to paste into BigQuery.

4

Two output modes

Concise mode for stakeholders (plain language, no jargon). Verbose mode for engineers (code snippets, metric maps).

5

Validation queries included

Generates ready-to-run BigQuery SQL that shows both systems' data side-by-side, plus a diff query highlighting mismatches.

6

Smart exclusions

Automatically skips the variable_billing/ folder in Lighthouse (which just copies VB output) and compares against the actual independent logic.

5-phase workflow

Phase 1

Resolve Product

Fuzzy-match the product name to directories in both repos

Phase 2

Fetch & Inventory

Read all SQL files, extract output metrics from each

Phase 3

Compare Metrics

Match metrics across repos, compare source tables, filters, formulas

Phase 4

Verdict & Report

Classify as match / minor / significant, format per output mode

Phase 5

Validation Queries

Generate ready-to-run BQ SQL filtered to the customer

One skill, two audiences

Concise (default)

For Stakeholders

Non-technical — Slack-ready plain language
  • One-line verdict in plain English
  • 2-4 bullet points explaining why numbers differ
  • Business terms, not SQL jargon
  • One validation query with instructions
Verbose

For Engineers

Technical — full metric map with code
  • All files compared with metric attribution
  • Metric map table: VB vs Lighthouse per column
  • Code snippets showing exact divergences
  • Three validation queries (VB, Lighthouse, diff)

Real-world simulations

The skill was validated against two products with very different characteristics.

Simulation 1 — Verbose Mode

R2 Storage

Significant Differences
  • Same source table (ida_base.r2_account_metrics_daily)
  • VB is daily, Lighthouse aggregates monthly
  • Different customer scoping filters
  • VB tracks IA-tier metrics; Lighthouse doesn't
  • Storage units differ (raw bytes vs GB-months)
  • Lighthouse adds projections, caps, entitlements
Simulation 2 — Concise + sf_account_id

API Shield

Significant Differences
  • Different source tables entirely
  • VB measures billable request volume
  • Lighthouse measures feature adoption + total requests (in separate files)
  • Exposed the file-matching flaw — led to v3 metric-level matching
  • Tested with sf_account_id = 001o000000aLh2XAAS

Concise mode output for API Shield

Here's what the skill actually produced when asked to compare API Shield for a specific customer. This is what you'd paste into Slack.

API Shield sf_account_id = 001o000000aLh2XAAS
Numbers are expected to differ (different data sources and measurement methods)
  • The billing system measures billable API request volume — it counts HTTP requests on zones with an API Shield subscription, filtered to specific request types, and only for enterprise zones
  • Lighthouse measures two different things: API Shield feature adoption (endpoints discovered, schemas uploaded) and total request volume by status code — it counts all HTTP requests on all active zones, with no subscription or request-type filtering
  • The request counts in billing vs Lighthouse come from different underlying data sources that measure HTTP traffic differently, so exact matches are not expected even for the same zone
  • If you're comparing request numbers, expect Lighthouse to show higher counts because it includes all request types and all zone plans, while billing only counts specific filtered requests on enterprise zones
A ready-to-run validation query is also included — Paste into BigQuery to see the data side-by-side for this customer

How we got here

The skill went through 3 major iterations, each driven by real-world testing and feedback.

Version 1.0
File-level comparison with 11-section report
Initial draft. Compared one SQL file from each repo. Output was an 11-section technical report covering source tables, date filtering, customer filters, metrics, aggregation, config, and more. Validated against R2 product.
+ Core workflow + variable_billing/ exclusion + Decision rules
Version 2.0
Customer-scoped queries + concise/verbose modes
Added sf_account_id support — the most common use case is a stakeholder asking about one specific customer. Introduced concise mode (3 sections) vs verbose mode (trimmed from 11 to 5 sections). Verbose was still too long in v1.
+ sf_account_id filtering + Concise / verbose modes ~ Report trimmed 11 → 5 sections
Version 3.0 — Corrections
Metric-level matching + non-technical concise output
Testing with API Shield revealed a critical flaw: the skill was matching by filename, but the relevant "requests" metric was in a different file (api_request_status_by_zone_monthly_query.sql) than the main product file. Switched to metric-level matching across ALL SQL files. Also rewrote concise mode tone guidelines for non-technical stakeholders.
! Fixed file-level matching flaw ~ Metric-level matching across all files ~ Concise mode: non-technical tone + Tone guidelines for plain language
Published
Quality score: 13/13 — Published to Skills API
Final version validated and published. Available as a local skill in the variable-usage repo and via the Skills API registry.
+ Published to Skills API + Saved locally to .opencode/skills/

What we caught and fixed during development

!

variable_billing/ trap

Lighthouse has a queries/bigquery/variable_billing/ folder that copies VB output. Comparing against it would always show "match" — misleading. Added explicit exclusion rule.

!

File-level matching failure

API Shield simulation showed that the "requests" metric lived in a different file than expected. Switched from file-level to metric-level matching across all SQL files.

!

Too-technical concise output

First concise mode still used SQL jargon. Added explicit tone guidelines: say "billing system" not "variable-usage pipeline", "enterprise zones" not "zone_plan = ENT".

OpenCode patted itself on the back — but it was wrong!

...but it also took feedback well!

After the first API Shield simulation, OpenCode declared success — saying the skill "handles well" the case where pipelines measure completely different things. But it missed that a different lighthouse file actually had the comparable requests metric. One nudge later, the entire matching approach was redesigned.

This was a particularly interesting test case because it revealed something the skill handles well: when the two pipelines measure completely different things for the same product. The skill correctly identified that VB tracks billable request volume while Lighthouse tracks feature adoption metrics, and adapted the validation query to be a diagnostic "what does each pipeline see" query rather than a traditional diff.
⚠️ What it missed: The lighthouse repo had api_request_status_by_zone_monthly_query.sql with a requests metric that actually was comparable to VB's sum_requests — but the skill never looked at that file because it was matching by filename, not by metric.
The fix: One piece of feedback later, the entire Phase 2 was redesigned — from file-level matching to metric-level matching across ALL SQL files. This became the defining feature of v3.