Federated Healthcare Fraud Defense (SignGuard)
A secure federated learning framework that lets US states collaboratively train healthcare fraud detection models without sharing sensitive patient data.
Tags
Technologies
Architecture
SignGuard Defense Pipeline
Multi-layered defense against adversarial attacks in federated learning environments
Overview
Machine learning is a critical tool for identifying anomalous provider behavior in national healthcare systems. However, centralizing highly sensitive claims data (like Medicare/Medicaid) poses severe compliance and privacy risks (HIPAA). Federated Learning (FL) offers a paradigm to collaboratively train fraud detection models across decentralized institutional silos (US states) without transferring raw patient histories.
However, Vanilla FL is wildly vulnerable to Byzantine failures and Data Poisoning via Sybil networks. This repository contains the implementation of SignGuard, a hybrid cryptographic defense mechanism evaluated against the largest public healthcare dataset to dateβ227 million real-world HHS Medicaid provider claims partitioned across 54 United States jurisdictions.
π The SignGuard Architecture
SignGuard strictly decouples identity non-repudiation from gradient validation.
1. Cryptographic Identity (ECDSA NIST P-256)
All participants generate an Elliptic Curve key pair. The aggregation server drops any update failing verification, neutralizing network spoofing.
2. Statistical Validation & Reputation
Accepted identities have their parameter gradients mathematically audited:
- L2 Norm Thresholding: Reject structural explosions inherent to gradient ascent poisoning.
- Cosine Alignment: Angle determination against the historical global memory vector to detect coordinated subversion.
- Reputation Ledger (EMA): Rejections immediately decay trust exponentially, permanently isolating Sybil participants.
π Empirical Evaluation Matrix
- Random Poisoning (Sensory failures)
- Model Poisoning (Gradient Ascent / Targeted degradation)
- Label Flipping (Collusion data-poisoning)
- Free Riding (Compute theft)
- Sybil Networks (Coordinated identity spoofing)
SignGuard actively neutralizes these structural and Sybil manipulation injections while preserving AUPRC within 1.8% of theoretically optimal centralized baselines.
βοΈ Evolution & Core Research
The real-world implementation applied to Medicaid stems from the core theoretical research on ECDSA-Based Federated Learning Defense.
The Underlying Security Proof
Federated Learning (FL) is vulnerable to Byzantine attacks where malicious clients submit poisoned model updates. Existing solutions (Krum, Multi-Krum) lack cryptographic verification of client identity.
SignGuard introduces three core algorithmic components under the hood:
- Signature Generator: Clients sign their model updates using ECDSA private keys (SECP256R1), ensuring absolute identity non-repudiation.
- Verification Engine: The central aggregator verifies ECDSA signatures in ~1.2ms per client before performing any compute-heavy aggregation logic, instantly dropping unauthenticated packets.
- Reputation Manager: Verified updates undergo statistical anomaly detection. Anomalous vectors immediately decay trust exponentially, permanently isolating Sybil participants over multiple rounds.