Research Methodology

The 411bz Research Methodology defines how the Structural Authority Score engine is validated, calibrated, and governed. Every scoring decision is tested against a CMS-diverse Probe Observatory, verified through SHA-256 determinism checks, and subjected to adversarial stress testing before deployment. Calibration follows strict 60-day windows with documented extraction patch governance.

Probe Observatory

The Probe Observatory is a curated dataset of 156+ websites used to validate SAS scoring accuracy, stability, and fairness across the full diversity of real-world web architectures. It is not a convenience sample. Every probe is selected to represent a distinct structural archetype.

8 CMS Types

WordPress

Shopify

Wix

Squarespace

Webflow

Next.js

Hugo

Custom HTML

Diversity Requirements

Probes span multiple verticals (professional services, e-commerce, SaaS, healthcare, legal, education, media), structural complexity levels (minimal single-page sites to enterprise multi-domain architectures), geographic regions, and language configurations. No single CMS type represents more than 25% of the observatory.

Adversarial Probes

20 adversarial probes are specifically designed to stress the scoring engine. These include sites with inflated schema (valid but excessive JSON-LD), FAQ spam (high-volume low-quality Q&A pairs), headless rendering (JavaScript-only content delivery), floor compression (sites engineered to appear minimal), and enterprise complexity (multi-brand, multi-domain architectures). Adversarial probes ensure the engine does not reward structural gaming.

Structural Authority Score reference →

Calibration Process

SAS calibration is a structured, multi-phase process that validates every aspect of the scoring engine before weights are finalized. No calibration change is made without statistical evidence from the full probe dataset.

1. Determinism Testing

Every probe site is scanned a minimum of 3 times. The scoring output must produce identical SHA-256 hashes across all runs. Stable JSON serialization with sorted keys eliminates object-order nondeterminism. Any hash mismatch triggers an immediate investigation.

Verification method: SHA-256 hash comparison of full scoring output with stable key ordering.

2. Distribution Shape Analysis

SAS score distribution across the full probe set is monitored for compression (scores clustering too tightly), skew (disproportionate weight to one end), and bimodal artifacts (unexpected clustering around two distinct scores). A healthy distribution reflects genuine structural diversity without scoring-induced distortion.

Failure condition: Any distribution anomaly triggers weight review before calibration proceeds.

3. Dimension Correlation Studies

No single dimension should explain more than 35% of total SAS variance. Correlation studies verify that dimensions remain independently informative. If two dimensions become highly correlated, the model risks redundancy and the calibration review evaluates whether weight redistribution is necessary.

Threshold: Maximum 35% variance explained by any single dimension.

4. Weight Sensitivity Simulation

Simulated weight perturbations confirm ranking stability. Small changes to individual dimension weights should not produce large, disproportionate changes to relative probe rankings. Sensitivity testing ensures the model is robust to minor calibration adjustments and does not exhibit cliff-edge behavior.

5. Adversarial Testing

The 20 adversarial probes are scored and evaluated against expected behavioral constraints. Schema inflation must not produce outsized SAS gains. FAQ spam must not dominate scoring. Headless-rendered content must be scored equivalently to server-rendered content where structural signals are identical.

Constraint: No adversarial probe may score in the top 10% of the observatory without genuine structural merit.

60-Day Calibration Windows

SAS scoring weights are locked during 60-day calibration windows. This ensures scoring stability for all entities being measured and prevents reactive adjustments that could undermine trust in the measurement system.

During the window: Only extraction bug fixes are permitted. No weight changes, no threshold adjustments, no new dimension introductions.
After the window: Calibration adjustments are made based on accumulated statistical evidence from distribution analysis, adversarial testing results, and dimension correlation studies.
Re-baseline requirement: Every calibration change triggers a full re-baseline of the entire probe dataset with pre-and-post distribution comparison documented.

Authority Engineering reference →

Extraction Patch Policy

When a scoring anomaly is identified, an extraction patch may be deployed to correct the extraction logic. Patches are not applied reactively. Four strict conditions must be satisfied before any extraction patch is deployed.

Condition 1: Reproducibility

The issue must be reproducible across 3 or more probe sites. Single-site anomalies do not warrant extraction patches. Reproducibility confirms the issue is systemic, not site-specific.

Condition 2: HTML-Verifiable Signal

The signal causing the anomaly must be directly verifiable in the HTML source. The scoring engine extracts from HTML. If the signal is not present in the raw HTML, it is not a valid extraction target.

Condition 3: Post-Patch Distribution Stability

After applying the patch to the full probe dataset, the overall SAS distribution must remain stable. The patch must not introduce compression, skew, or bimodal artifacts into the distribution shape.

Condition 4: Determinism Maintained

SHA-256 hash verification must pass on all probe sites after the patch is applied. If the patch introduces any nondeterministic behavior, it is rejected regardless of the extraction improvement it provides.

CMS Bias Detection

The scoring engine must not systematically favor or penalize any CMS platform. CMS bias detection is a continuous validation process applied during every calibration cycle.

Per-CMS distribution analysis: SAS distributions are computed separately for each CMS type. Statistically significant differences in mean or variance between CMS groups trigger investigation.
Extraction parity testing: Identical structural signals (same JSON-LD, same heading structure, same FAQ content) implemented across different CMS platforms must produce equivalent SAS scores within tolerance.
Platform-specific artifacts: CMS-generated boilerplate HTML, framework-injected metadata, and platform-specific DOM patterns are identified and excluded from scoring where they do not represent deliberate structural decisions.

Authority Glossary →

Frequently Asked Questions

What is the Probe Observatory?

The Probe Observatory is a curated dataset of 156+ websites across 8 CMS types (WordPress, Shopify, Wix, Squarespace, Webflow, Next.js, Hugo, and custom platforms) used to validate SAS scoring robustness. Probes span multiple verticals, structural complexity levels, and geographic regions to ensure accurate, unbiased scoring.

How does 411bz ensure SAS is deterministic?

Determinism is verified through SHA-256 hash comparison. Every probe site is scanned a minimum of 3 times, and the scoring output must produce identical hashes across all runs. Stable JSON serialization with sorted keys eliminates object-order nondeterminism. Any hash mismatch triggers an immediate investigation and extraction patch review.

What conditions must be met before an extraction patch is deployed?

Four conditions must be satisfied: the issue must be reproducible across 3 or more probe sites, the signal must be verifiable directly in HTML source, post-patch score distribution must remain stable, and determinism must be maintained with SHA-256 hash verification passing on all probes. No patch is deployed unless all four conditions are met.

View all 30 frequently asked questions →

See the Engine in Action

Run a free Structural Authority Score scan and see how the calibrated engine evaluates your site across 8 dimensions.

Run Free SAS Scan