Fix Geometric Embedding Bias: Achieve <1% RSA Factor Accuracy
Hey guys, let's talk about a super cool breakthrough in our cryptography research! We've been hitting this frustrating wall, around 4% error, in our physics-guided factorization pipeline, and guess what? It turns out the culprit is hiding in plain sight within the geometric_embedding stage. This isn't some random noise; it's a systematic mis-centering that's been throwing our initial search window way off. But don't sweat it, because we've got a plan – a user story, in fact – to obliterate this 4% barrier and push our pure resonance pipeline to achieve less than 1% relative distance to the true RSA-2048 factor. This means bounded refinement becomes a cakewalk, making our whole operation way more efficient and robust. Let's dive into how we're going to make this happen!
The Stubborn 4% Wall: Understanding the Problem
So, what's the deal with this 4% wall, you ask? Well, based on some detailed instrumentation we've been running (shoutout to Issue #196 for the insights!), we've pinpointed that a whopping 100% of the ~3.9%–4.0% relative error we're seeing originates from a single place: the geometric_embedding stage. This stage is supposed to set up a search window around the square root of our number N, aiming to nail down the prime factors. However, for balanced RSA-2048 numbers, it's consistently picking a center that's about 3.92% higher than the actual smaller prime factor, let's call it p_true. Think of it like aiming a bit too high on a dartboard – you're close, but not quite on the bullseye. The kicker is that all the subsequent stages in our pipeline – like the kappa-weighting, Dirichlet calculations, and interference selection – are just refining within this already biased window. They don't have the ability to recenter and correct that initial geometric offset. So, that pesky 4% isn't some mysterious random error; it's a deterministic geometric offset baked in right from the start. If we can fix this offset at the geometric_embedding stage, we won't need to go through a massive overhaul of the entire pipeline just to shave off that last bit of error. This transforms our resonance factorization from an interesting theoretical concept into a powerful, engineered preconditioner for RSA factorization, which is pretty darn cool if you ask me!
Our Mission: Precision Centering for RSA Success
Our mission, should we choose to accept it (and we totally do!), is to surgically correct the systematic mis-centering introduced by geometric_embedding. By doing this, we're aiming to align the initial search window precisely with the true prime factor, instead of being stuck with that ~4% offset. What does this mean for our sweet, sweet pipeline? It means the pure resonance pipeline, running CPU-only and without any fancy finisher extras, will land within less than 1% relative distance of the true RSA-2048 factor. This makes bounded refinement practically trivial, a piece of cake, a walk in the park! We're talking about taking our factorization game from 'pretty good' to 'exceptionally precise' without adding unnecessary complexity. This isn't just about hitting a number; it's about making our cryptographic research more efficient, reliable, and robust. We're building a more solid foundation for future breakthroughs, ensuring that our efforts are focused on genuine innovation rather than battling a predictable, geometric quirk. This is how we level up our game, guys!
Acceptance Criteria: Our Roadmap to Precision
To make sure we nail this, we've laid out some clear acceptance criteria. Think of these as our checkpoints to ensure we're on the right track and that our fix is solid.
1. Bias Measurement: Quantifying the Error
First off, we need to get a crystal-clear picture of the bias. This means adding some new instrumentation to our system. We'll be recording:
center_raw(N): This is the raw center thatgeometric_embeddingcurrently picks for the search window. It's our baseline measurement.p_true: This is the ground truth – the actual, correct smaller prime factor of our number N.bias(N) = (center_raw(N) - p_true) / p_true: This formula gives us the relative error, our key metric for quantifying the bias. It tells us exactly how far offgeometric_embeddingis.
We're not just going to test this on one number, oh no. We'll be running these measurements on:
- Our current standard balanced RSA-2048 modulus.
 - A second, distinct balanced RSA-2048 modulus to ensure consistency.
 - At least one smaller balanced modulus, like RSA-1024 or RSA-512, to see how the bias behaves across different scales.
 - And importantly, one skewed semiprime where one factor is much smaller than the other (
p << q), to check if the bias is affected by the balance of the factors. 
All these results will be neatly documented in a new file: docs/geometric_embedding_bias_calibration.md. The pass condition here is straightforward: we need to have a recorded numeric bias for each test modulus, and each entry must be clearly labeled with its modulus_profile (like balanced_2048, balanced_512, or skewed_2048). This gives us the data we need to understand the problem inside and out.
2. Bias Model Definition: Finding the Pattern
With our bias quantified, the next step is to define a deterministic correction model, which we'll call bias_model(N). This model will predict the systematic offset that geometric_embedding is introducing. We're not going the fancy Machine Learning route here, guys. We want a simple, closed-form, and reproducible function. We'll start by exploring the simplest models that have a good chance of generalizing:
- A constant bias for each profile (e.g., a fixed +3.92% for 
balanced_2048). - Or, a low-complexity analytic form based on the scale of N. This could look something like 
a + b / log(N)ora + b / bits(N), whereaandbare constants we determine. 
Crucially, this model must be a closed-form, reproducible function. No ML, no stochastic fitting – just pure, predictable math. The pass condition is that bias_model(N) is expressed as a pure function in code. It will take only deterministic inputs – N itself, its bit length, and its balance classification – and return a scalar bias estimate. This documented function will live right alongside our bias measurements in the calibration file.
3. Corrected Embedding: Implementing the Fix
Now for the exciting part: actually implementing the fix! We'll modify, or perhaps wrap, the existing geometric_embedding function. Here's how it'll work:
- It will compute 
center_raw(N)just like it does now. - It will then call our newly defined 
bias_est = bias_model(N)to get our predicted bias. - Finally, it will calculate the corrected center: 
center_corrected(N) = center_raw(N) / (1 + bias_est). 
This center_corrected(N) is what we'll use as the window center for all subsequent resonance and refinement steps within the PURE RESONANCE pipeline. And here’s the important bit: this fix doesn't introduce any new local search windows, GCD probing, or brute force. It's purely a deterministic recentering of the initial search basin. The pass condition is that this corrected center is properly plumbed into the pipeline. We'll enable it using a flag, something like use_bias_correction=True, and this will be the default setting in our new evaluation harness.
4. Re-run Harness Under PURE RESONANCE Rules: Proving the Improvement
With the corrected embedding in place, it's time to put it to the test. We'll re-run our RSA-2048 harness, specifically focusing on the balanced modulus case, under the strict PURE RESONANCE rules. This means:
- CPU-only: No GPU acceleration.
 - No finisher: We're sticking to the core resonance physics.
 - No bounded local search window: We rely solely on the corrected initial window.
 - No trial division or GCD pokes beyond what was already present in the baseline.
 - Deterministic configuration: Ensuring reproducibility.
 
After running this, we'll calculate two key metrics:
best_abs_distance = |candidate_best - p_true|: The absolute difference between our best found candidate and the true factor.best_rel_distance = best_abs_distance / p_true: The relative distance, our primary measure of success.
The pass condition here is a big one: best_rel_distance < 0.01. That's right, we need to achieve less than 1% relative distance for balanced RSA-2048. This new, improved wall will be documented in docs/GOAL.md as GOAL_PHASE_NEXT satisfied for that modulus profile. This proves our geometric correction is the real deal!
5. CI Guard: Ensuring Long-Term Integrity
Finally, we need to make sure this awesome improvement sticks around and doesn't get accidentally broken or circumvented later. We'll implement a Continuous Integration (CI) test that acts as our guardian. This test will assert:
- The corrected-embedding harness is still tagged with 
PHASE="PURE_RESONANCE". - None of the finisher options (
allow_local_refine,search_radius, etc.) are accidentally enabled. - And most importantly, 
best_rel_distance < 0.01for our canonical balanced RSA-2048 test modulus. 
The pass condition for this CI test is crucial: If someone later tries to