V4.3 Shadow Model Note

Status: V4.3-shadow task-weighted shadow model

Promoted live

This page explains release impact and readiness. It does not replace the formula spec on Methodology, the threshold detail on Appendix, the schema/download contract on Data, or the citation layer on Research.

Comparison baseline

V4.2

pre-promotion live release used for comparison

Required inputs ready

4/4

present locally for shadow scoring

Published task-native occupations

485

occupations currently scored with the task-native shadow model

Validation comparison

2/3

current match-or-improve gates passing

Published shadow artifacts

The shadow layer is now auditable as data, not just as a readiness note.

Shadow scores

Per-occupation task-adjusted scores and fallback status.

Comparison summary

Score deltas, band flips, and anchor-review counts versus V4.2.

Validation comparison

BLS, family, and cluster comparisons against the live baseline.

Current shadow validation deltas: cluster -0.6667, BLS -0.0471, family -0.2037.

What Changes Already Affect Users

Now live in V4.3

Bootstrap uncertainty intervals are published on occupations today.
Structural risk and near-term risk are separated in the forecast layer.
Task-primitives fields now publish weighted evidence where normalized O*NET task matches exist; sparse occupations remain explicit null.
The release and governance surfaces now expose shadow-model readiness instead of hiding it.
485 occupations now have published task-native shadow scores for comparison against the live baseline.

What the audit trail still preserves

The V4.3 shadow model has already been promoted into the live structural release.

Remaining Input Gaps

All required local shadow-model inputs are now present.

Input Readiness

anthropic task penetration

data/raw/external/anthropic_task_penetration.csv

present

onet task statements

data/raw/external/onet/Task_Statements.txt

present

onet task ratings

data/raw/external/onet/Task_Ratings.txt

present

empirical mobility

data/raw/external/sg_empirical_mobility.json

present

Coverage Snapshot

Occupations

562

current published universe

Direct mapped

521

eligible for the direct coverage gate

Median direct matched task share

100%

current direct-coverage gate basis

Task-weighted share

86%

headline score still untouched

Promotion Gates

Gate	Threshold	Actual	State
Median matched task weight share across direct-mapped occupations This gate prevents a sparse task layer from directly changing the headline score before task matching is broadly comparable.	>= 0.6	1	pass
Experimental task-adjusted score matches or improves current validation diagnostics Requires at least 2 of 3 external checks to match or improve baseline. Current results: cluster directional accuracy 0.3333 vs 1; BLS rho -0.1908 vs -0.1437; family rho -0.4457 vs -0.242.	at_least_2_of_3	2/3	pass
No implausible anchor label flips without written rationale 8/8 anchors screened; 0 candidates still need editorial sign-off.	zero_unexplained_flips	0	pass

Gate

Threshold

Actual

State

Median matched task weight share across direct-mapped occupations

This gate prevents a sparse task layer from directly changing the headline score before task matching is broadly comparable.

>= 0.6

pass

Experimental task-adjusted score matches or improves current validation diagnostics

Requires at least 2 of 3 external checks to match or improve baseline. Current results: cluster directional accuracy 0.3333 vs 1; BLS rho -0.1908 vs -0.1437; family rho -0.4457 vs -0.242.

at_least_2_of_3

2/3

pass

No implausible anchor label flips without written rationale

8/8 anchors screened; 0 candidates still need editorial sign-off.

zero_unexplained_flips

pass

What V4.3 Proved

V4.3 proved that task evidence works best as a disciplined exposure upgrade, not as a full wholesale rewrite of the structural formula. The candidate task-native formulas remain published as research scaffolding for V5, not as live rules.

effective_coverage = Σ_t w_it · exposure_t · success_t

net_risk = automation_pressure_i · (1 - λ · concentration_i) · market_modifier_i

What Must Happen Next

Keep the full V4.2 vs V4.3 comparison published so the promotion remains auditable.
Start V5 as sidecar workstreams: augmentation heterogeneity, empirical mobility, posterior uncertainty, and realized-risk forecasting.
Do not absorb multiple new constructs into the live score without separate sidecar validation first.
Treat the current empirical mobility prior as supporting evidence until a higher-granularity Singapore transition dataset exists.