How HalalScan is built: rules engine, keywords, and architecture

How the open-source HalalScan app decides halal status today — deterministic keywords first, AI deliberately off — and what we are considering for a future layer.

HalalScan (bilocan/halal_checker) is an open-source Flutter app that scans food barcodes and estimates whether a product is halal. This site is its web companion: same Supabase backend, same rule engine (ported to TypeScript), and community tools around products and keywords.

This post is for developers who want to understand how we solve the problem today, what we deliberately left out, and known limitations — so you can contribute without rediscovering them.

Why open source?

Halal scanning is not a casual UX problem. People change what they buy based on the label your app shows. A closed codebase asks users to trust a logo; an open one lets them read the rules, run the tests, and fix mistakes in public.

We chose open source (Flutter app, web companion) because the product is only as trustworthy as the logic behind it — and that logic should not be a black box.

Community-driven, not vendor-driven

The engine ships with curated built-in keywords, but coverage grows from the community:

Channel	What contributors do
Keyword suggestions	Propose new terms or spellings; moderators approve into Supabase `keywords`
Discussions & reports	Challenge bad product data, debate edge cases, flag wrong verdicts
GitHub PRs	Add variants, fix false positives, extend tests in `ingredient_keywords.dart`
Rule Engine tester	Reproduce a match locally before opening an issue

Nobody needs permission from a single company to improve the list. A contributor in Malaysia can add a local E-number spelling; someone in Germany can fix an alkoholfrei false positive — both through the same review paths (moderation for DB keywords, PR review for built-in rules).

That matters for languages and markets we will never fully cover in-house. Open source turns “missing variant” from a support ticket into a pull request.

What openness buys you technically

Same rules everywhere — Dart on device, TypeScript on the web, exported keyword-rules.json in Storage; drift is visible, not hidden.
Regression tests — Shared fixtures in CI; a rule change without a test is harder to merge by accident.
Fork and audit — Mosques, student projects, or regional forks can run their own instance with their own moderation, still on the same engine.
No lock-in — Product cache and keywords live in Supabase you can self-host; the app does not depend on a proprietary rules API.

We are not claiming “open source = automatically halal.” We are claiming the mechanism should be inspectable, and improvements should come from many eyes — especially for ingredient wording that changes by country, brand, and language.

The core problem

Ingredient lists are messy: multiple languages, E-numbers, abbreviations, and vague terms like “natural flavour.” A single missed haram ingredient is worse than a false alarm, so we built around something auditable and offline-capable first.

There is also no complete product database for halal checking. Open Food Facts and our Supabase products cache help, but coverage is patchy by region and brand — many barcodes return no ingredients, stale text, or nothing at all. You cannot “solve halal” with a single downloaded DB; you have to fill gaps continuously and treat every automatic verdict as only as good as the ingredient string behind it.

That is why HalalScan is not only a rules engine on top of OFF. Our practical stack around missing data looks like this:

Gap	What we built
No ingredients on file	OCR on label photos — admins extract text from packaging images, split it into ingredient chips, and attach them to a product (web admin today; same idea for user-submitted label photos via contributions)
Crowdsourced fixes must not go live unchecked	Approval workflows — pending keyword suggestions, product corrections, ingredient contributions (from the app), and reports are reviewed before they affect shared data
Rules alone do not settle nuance	Forum — product-wide and per-ingredient discussions, often tied to an ingredient challenge from the app, so edge cases are argued in the open instead of silently “fixed” in code
Contributors need shared guardrails	Guidelines — what belongs in haram vs suspicious, when to suggest a keyword vs open a discussion, and that scans are ingredient analysis not certification (mirrored in copy on suggest/report flows and admin moderation)

Together: barcode when you can, OCR and community when you cannot, humans in the loop before shared data changes.

AI is not part of the live verdict path. Edge Functions and prompts exist in the repo as groundwork, but production scans use the rules engine (plus Open Food Facts and cached products). That is an intentional product decision, not a missing feature we forgot to ship.

High-level architecture (today)

┌─────────────────┐     ┌──────────────────────────┐     ┌─────────────────┐
│  Flutter app    │────▶│  Supabase                │◀────│  halal-checker- │
│  (on-device)    │     │  DB, Storage, community  │     │  web (Next.js)  │
└────────┬────────┘     └────────────┬─────────────┘     └────────┬────────┘
         │                           │                            │
         │  HalalRulesEngine (Dart)  │  products, keywords,       │  halal-rules-engine.ts
         │  ingredient_keywords.dart │  discussions, reports      │  + Rule Engine tester
         └───────────────────────────┴────────────────────────────┘

  Planned (not enabled): lookup-product / deep-analyze Edge Functions + Claude

Flutter owns canonical rules and runs matching on device. Supabase caches products and hosts community keywords, discussions, and moderation. This web project exposes the database, a Rule Engine tester, and admin flows for keywords and rule uploads.

How a verdict is decided today

Layer 1 — Rules engine (primary)

HalalRulesEngine in lib/services/halal_rules_engine.dart matches ingredient text against lists in lib/constants/ingredient_keywords.dart:

List	Effect
Haram	Product is not halal (alcohol, pork, gelatin, carmine, selected E-numbers, …)
Suspicious	No hard haram call; user should verify source (whey, rennet, E471, natural flavour, …)

Each canonical keyword has variants (spellings and languages) in haramVariants / suspiciousVariants. Matching uses Unicode-aware word boundaries so, for example, porcelain does not match pork.

Special cases in code:

Fatty alcohols (cetyl, stearyl, lanolin, …) are excluded from the drinking-alcohol rule.
Negation — “alcohol-free”, “sans alcool”, “alkoholfrei”, and similar phrases are not flagged as haram alcohol.
Phrase variants use substring matching; single-word variants use boundaries (phrases are easier to over-match — see limitations below).

Layer 2 — Community keywords

Approved rows in Supabase keywords (from moderated keyword_suggestions) merge into the same matching logic as built-in rules, so coverage can grow without an app store release.

Layer 3 — Product data

Verdicts only matter if we have ingredients. The app loads from cache, Supabase products, or Open Food Facts. Bad or missing ingredient data limits any engine — rules included.

Built-in rules vs custom keywords

Use custom keyword (Supabase) when…	Change built-in rules (Dart) when…
Narrow addition, clear wording	Safety-critical; must work offline
Same matching logic is enough	Matching logic or exceptions need code
Came from community feedback	Multilingual variants or new exception type

Built-in rules export to keyword-rules.json (Flutter CI → Supabase Storage). The web tester fetches that file at runtime, with lib/rules.json as fallback.

Dual engine: Dart + TypeScript

Source of truth: ingredient_keywords.dart in the Flutter repo.
Web port: lib/halal-rules-engine.ts in halal-checker-web.
Sync checks: shared test/fixtures/engine_cases.json in CI; npm run sync:check diffs exported rule JSON.

Two implementations can drift in matching logic even when keyword data matches. Fixture tests exist to catch that.

Product lookup pipeline (Flutter)

1. Test DB (debug only)     → instant fixtures
2. SharedPreferences cache  → 30-day TTL
3. Supabase product cache   → shared DB + rules engine on ingredients
4. Open Food Facts direct   → fetch + rules engine (no AI)

ProductService orchestrates this; CacheService and SQLite scan history keep the UX usable offline.

Community (separate from automatic verdict)

Discussions, ingredient challenges, and wrong-verdict reports live in Supabase and on this site. They do not change the automatic scan result unless a human moderation or scholar workflow acts on them. That separation matters: the app’s default label is machine-assisted ingredient checking, not a fatwa.

Scaffolding for Deep Analysis (per-ingredient AI cards, product_analyses, deep-analyze-product) exists but is not relied on while AI remains disabled.

Planned next layer: AI — is it a good idea?

We are often asked whether HalalScan should “just use AI.” Short answer: maybe, but not as the judge — and not a general chat model without guardrails.

Why AI is off for now

Concern	What it means for halal scanning
Auditability	Users and contributors need to see which rule fired. LLM outputs are harder to diff, test, and explain in court-of-public-opinion disputes.
False negatives	Missing one haram synonym is unacceptable. Models optimize for plausibility, not worst-case safety.
False positives	Over-flagging erodes trust and punishes brands unfairly.
Religious nuance	Madhhab differences, “doubtful” vs haram, and certification vs ingredients are not solved by scale alone.
Over-trust	A confident “Halal ✓” from an app logo feels like a religious endorsement. We want copy and UX that stay humble.
Ops	Latency, cost, API keys, rate limits, and vendor lock-in — fine for optional features, risky as the only path.

The rules engine is boring on purpose: same input → same output, covered by tests, readable in a PR.

If we add AI later, what role should it play?

We are unlikely to enable “AI decides halal” as layer 1. More realistic roles:

Parsing helper — Turn messy OCR or unstructured ingredient blobs into a clean token list for the rules engine (AI suggests, rules decide).
Explanation helper — Plain-language “why suspicious” text that always links back to matched keywords or “no rule matched.”
Discovery helper — Propose new variants or keywords for human approval (already how keyword_suggestions works).
Deep dive (optional) — Long-form per-ingredient notes with citations, clearly labeled supplementary, never overriding a haram keyword hit.

Non-negotiable if AI ships: known haram terms from the rules engine always win. AI cannot clear a product that matched a hard haram rule.

General LLM vs a small, purpose-built model

Approach	Pros	Cons
General LLM (e.g. Claude via Edge Function)	Fast to prototype; good at language and explanations; code paths already sketched in repo	Hard to regression-test; may invent ingredients or rulings; costly at scale; “trust” is branding, not proof
Small specialized model (classifier / NER: haram, suspicious, or unknown)	Cheaper inference; fixed output schema; easier to benchmark on a golden dataset	Needs curated training data and ongoing maintenance; still wrong on edge cases; does not replace scholarly judgment
Rules only (current)	Transparent, offline, community-extensible	Misses novel spellings until someone adds a variant; weak on unstructured text

Our bias: stay rules-first. If we invest in ML, prefer a narrow model (ingredient tagging, language detection, variant suggestion) over an open-ended “is this halal?” prompt. A mini model trained only on food-ingredient halal labels might be more testable than GPT-style answers — but only if we treat its output like another input to the engine, not the final verdict.

Would users trust it?

Trust comes from transparency, not model size:

Show every match: canonical keyword, reason, variant that hit (tester).
Distinguish “rule matched” vs “AI suggestion (unverified).”
Keep suggest and report loops so mistakes get fixed in data, not in prompt tweaking alone.
Never imply certification; ingredient analysis ≠ halal logo on the package.

For many Muslims, an opaque model is less trustworthy than a published keyword list they can argue with. We optimize for the second.

Practical roadmap (draft)

Now — Harden rules, variants, community keywords, OFF data quality, web/app parity.
Next — Optional AI behind a flag: parsing + explanations only; keyword override mandatory.
Later — Evaluate a small classifier on a fixed dataset; compare against fixtures before any user-facing verdict influence.
Always — Scholar/community paths for disputes; AI does not close threads.

What we got right (so far)

Auditable rules — User-visible reasons; the transparency tester shows every match.
Offline-first safety — Built-in lists work without network or API keys.
Open data path — Community keywords, reports, rule JSON in Storage.
Multilingual variants — One canonical key, many surface forms (10 languages on the web tester).
Honest scope — We flag ingredients; we do not certify brands.

Known limitations

Phrase matching is broader than word matching — Multi-word variants use includes(); overly generic phrases can false-positive. Add tests when fixing.
“Suspicious” is not “halal” — Users must still verify source (whey, emulsifiers, etc.).
Ingredient data quality — Open Food Facts varies by region. Wrong or missing lists → wrong verdicts, regardless of engine.
Two engines, one truth — Dart rule changes without uploading keyword-rules.json (or /admin/rules) leave the web tester stale.
Religious nuance — Automate suspicious where scholars disagree; avoid hard haram unless widely agreed or clearly defined (e.g. pork, alcohol as beverage).
Certification vs ingredients — A clean list does not replace a trusted halal certification on processed foods.

Where to look in the repo

Area	Flutter path
Keyword lists & variants	`lib/constants/ingredient_keywords.dart`
Engine logic	`lib/services/halal_rules_engine.dart`
Product pipeline	`lib/services/product_service.dart`
Custom keywords	`lib/services/keyword_service.dart`
UI catalog / suggest	`lib/screens/keywords_screen.dart`

On the web: lib/halal-rules-engine.ts, app/transparency/, app/admin/keywords/, app/admin/rules/.

Contributing

False positive? Failing test in test/services/keyword_analysis_test.dart, then narrow the rule or add an exception.
False negative? Variants or /suggest; safety-critical terms should land in built-in rules eventually.
Engine change? Update Dart and TypeScript, refresh fixtures, export rules JSON.

Questions and PRs welcome on GitHub. If you have opinions on the AI layer — especially dataset design or evaluation — open a discussion; we would rather design it in public than flip it on quietly.