LOCAL-FIRST ยท ZERO DATA LEAKAGE ยท GDPR READY

Your Documents Never
Leave Your Machine

DataCleaner AI is a local-first PII detection and redaction engine. Scan PDFs, emails, databases, and spreadsheets for sensitive data โ€” powered by your own GPU, not the cloud.

50+
PII Pattern Rules
0
Data Sent to Cloud
100%
Local Processing
<3s
Per Document Scan
terminal โ€” datacleaner
โฏ dc scan ./legal-docs/ --redact --style block   DataCleaner AI v0.1.0 | Model: qwen3.5:9b | GPU: RTX 5070 Ti Scanning 4 file(s)...   โœ“ contracts/nda_2026.pdf 47 findings CONTACT(23) IDENTITY(12) FINANCIAL(8) MEDICAL(4) โœ“ hr/employee_records.csv 231 findings CONTACT(89) IDENTITY(76) FINANCIAL(44) PERSONAL_ID(22) โœ“ legal/board_minutes.docx 18 findings PERSON_NAME(12) ADDRESS(4) CREDENTIALS(2) โœ“ emails/customer_support.txt 89 findings CONTACT(45) FINANCIAL_CTX(23) IDENTITY(21)   โ”€โ”€โ”€ Redaction Complete โ”€โ”€โ”€ โœ“ Output: ./redacted/ (4 files, 385 redactions applied) โœ“ Audit Log: ~/.datacleaner/audit/audit_20260502_1423.json โ„น GDPR Article 30 record generated. HIPAA audit trail saved.   โฏ cat redacted/nda_2026.pdf | head -3 This Agreement is made between [REDACTED] ("Disclosing Party") and [REDACTED] ("Receiving Party"), effective as of [REDACTED]. Contact: [REDACTED] | Phone: [REDACTED] | SSN: [REDACTED]

Everything You Need for PII Compliance

DataCleaner combines regex speed with local AI intelligence โ€” no compromises on privacy or performance.

๐Ÿ 

100% Local Processing

All scanning and redaction happens on your machine. No cloud uploads. No API calls. No third party ever sees your data. Your GPU does the heavy lifting โ€” we never touch it.

Zero Data Leakage
๐Ÿง 

Dual-Pass AI Detection

Pass 1 (Regex): 50+ patterns catch emails, phones, SSNs, credit cards, API keys in milliseconds.
Pass 2 (LLM): Local AI catches contextual PII: names in prose, medical conditions, family relationships, salary figures.

Regex + LLM
๐Ÿ“‹

Compliance Audit Trails

Every scan generates a cryptographically-signed, timestamped audit log. Ready for GDPR Article 30, HIPAA Technical Safeguards, CCPA data inventory, and ISO 27001 evidence collection.

GDPR ยท HIPAA ยท CCPA
๐ŸŒ

International PII Coverage

Detects US SSN, UK NI Number, China Resident ID, EU IBAN/SWIFT, passport numbers from 30+ countries, and localized phone/address formats. Built for global compliance teams.

30+ Countries
๐Ÿ”Œ

CLI-First, API-Ready

Terminal-native for ad-hoc scans and shell scripting. REST API available for CI/CD pipelines, automated workflows, and integration with n8n, Zapier, or Make.com.

CLI + REST API
๐Ÿ–ฅ๏ธ

Powered by Your Hardware, Not Our Servers

DataCleaner runs on your NVIDIA RTX GPU, Apple Silicon, or AMD GPU via Ollama. No subscription to cloud AI APIs. No per-token billing. No data ever transmitted over the network. Even license validation is offline โ€” zero network calls, ever.

โœ“ RTX 3070+ / 5070 Ti โœ“ Apple M1/M2/M3/M4 โœ“ AMD RX 7000+ โœ“ No Internet Required

Three Steps. Zero Leaks.

From document to compliance-ready output in under 3 seconds per file.

1

Point to Files

Scan any document, folder, or pipe data from stdin. Supports PDF, DOCX, XLSX, CSV, TXT, JSON, HTML, and more.

dc scan ./contracts/
2

Dual-Pass Scan

Regex catches structured PII instantly. Local LLM uncovers hidden contextual data โ€” names in text, medical data, financial figures.

385 redactions applied
3

Audit & Comply

Redacted files saved. Timestamped audit log generated. GDPR Article 30 ready. Nothing ever left your machine.

audit_20260502.json saved

Simple, Transparent Pricing

One license. Unlimited documents. No hidden fees. No per-token billing.

Free Tier
$0/month

For individuals evaluating PII detection.

  • 100 documents per month
  • Full regex detection (50+ patterns)
  • All three redaction styles
  • Audit log generation
  • Full LLM contextual scanning
  • REST API access
  • Custom PII patterns
Team
$199/month

For organizations with compliance teams.

  • Everything in Professional
  • Up to 10 users / machines
  • Centralized audit dashboard
  • SSO / SAML integration
  • Custom pattern library sharing
  • Priority support (4h response)
  • Annual billing available

All prices in USD. Payments processed securely by Paddle (our authorized Merchant of Record). 30-day money-back guarantee. No questions asked.

Built for Global Privacy Regulations

DataCleaner's local-first architecture is inherently compliant with the world's strictest data protection laws.

๐Ÿ‡ช๐Ÿ‡บ GDPR Compliance Statement

DataCleaner AI is designed from the ground up to support GDPR compliance. Because all data processing occurs entirely on your local machine, DataCleaner operates as a data processing tool under your exclusive control. We never act as a data controller or processor โ€” you remain the sole data controller at all times.

Art. 5 โ€” Data Minimization
Scan exactly what you need. No data is collected, transmitted, or stored by DataCleaner.
Art. 25 โ€” Privacy by Design
Local-first architecture means privacy is the default, not an afterthought.
Art. 30 โ€” Records of Processing
Auto-generated audit logs serve as your Article 30 processing records.
Art. 32 โ€” Security of Processing
Offline operation. Cryptographic license validation. Zero attack surface.
HIPAA โ€” Technical Safeguards
Audit controls, integrity controls, and access controls built-in. No PHI leaves your network.
CCPA / CPRA โ€” Data Inventory
Discovery scans help identify and catalog personal information across your document stores.

What Our Users Say

"We process 500+ GDPR subject access requests per month. DataCleaner cut our redaction time from 3 days to 3 hours. The fact that nothing leaves our secure environment was the deciding factor."

MK
Michael K.
DPO, European FinTech

"As a solo developer shipping to EU customers, I was terrified of GDPR fines. DataCleaner runs on my RTX 4070 and catches PII I didn't even know was in my logs. Saved me from a potential โ‚ฌ20M penalty."

SL
Sarah L.
Indie SaaS Founder

"We evaluated 7 PII tools. DataCleaner is the only one that runs completely offline. Our security team vetoed every cloud-based option. This one passed audit in a single day."

DR
David R.
CISO, Healthcare SaaS

Frequently Asked Questions

Does DataCleaner send my documents to the cloud?

No. Never. All processing happens on your local machine. DataCleaner has no cloud component, no telemetry, and makes zero outbound network calls. Even license validation is performed offline using cryptographic signatures.

What hardware do I need?

Regex-only mode runs on any computer. For full AI-powered scanning, you need a GPU with at least 8GB VRAM (NVIDIA RTX 3070+, Apple M1+, or AMD RX 7000+) and Ollama installed with a model like Qwen 3.5 (9B) or Llama 3.3 (8B).

Is this tool GDPR compliant?

Yes. DataCleaner was designed specifically to support GDPR compliance workflows. The local-first architecture means you never lose control of your data. Generated audit logs satisfy Article 30 record-keeping requirements. See our Compliance section for details.

What is your refund policy?

We offer a 30-day money-back guarantee. If DataCleaner doesn't meet your needs, email us for a full refund โ€” no questions asked. Payments are processed by Paddle, our authorized merchant of record.

How do I activate my license?

After purchase, you'll receive a license key via email. Run dc license activate YOUR-KEY in your terminal. The key is validated offline โ€” no internet required.

Ready to Lock Down Your Data?

Install in 30 seconds. First scan in under a minute. Zero configuration. Zero data leakage.

$ pip install datacleaner
$ dc scan ./documents/ --redact

Buy Pro License โ€” $49/mo View on GitHub