HalalScan (bilocan/halal_checker) is an open-source Flutter app that scans food barcodes and estimates whether a product is halal. This site is its web companion: same Supabase backend, same rule engine (ported to TypeScript), and community tools around products and keywords.
This post is for developers who want to understand how we solve the problem today, what we deliberately left out, and known limitations — so you can contribute without rediscovering them.
Why open source?
Halal scanning is not a casual UX problem. People change what they buy based on the label your app shows. A closed codebase asks users to trust a logo; an open one lets them read the rules, run the tests, and fix mistakes in public.
We chose open source (Flutter app, web companion) because the product is only as trustworthy as the logic behind it — and that logic should not be a black box.
Community-driven, not vendor-driven
The engine ships with curated built-in keywords, but coverage grows from the community:
| Channel | What contributors do |
|---|---|
| Keyword suggestions | Propose new terms or spellings; moderators approve into Supabase keywords |
| Discussions & reports | Challenge bad product data, debate edge cases, flag wrong verdicts |
| GitHub PRs | Add variants, fix false positives, extend tests in ingredient_keywords.dart |
| Rule Engine tester | Reproduce a match locally before opening an issue |
Nobody needs permission from a single company to improve the list. A contributor in Malaysia can add a local E-number spelling; someone in Germany can fix an alkoholfrei false positive — both through the same review paths (moderation for DB keywords, PR review for built-in rules).
That matters for languages and markets we will never fully cover in-house. Open source turns “missing variant” from a support ticket into a pull request.
What openness buys you technically
- Same rules everywhere — Dart on device, TypeScript on the web, exported
keyword-rules.jsonin Storage; drift is visible, not hidden. - Regression tests — Shared fixtures in CI; a rule change without a test is harder to merge by accident.
- Fork and audit — Mosques, student projects, or regional forks can run their own instance with their own moderation, still on the same engine.
- No lock-in — Product cache and keywords live in Supabase you can self-host; the app does not depend on a proprietary rules API.
We are not claiming “open source = automatically halal.” We are claiming the mechanism should be inspectable, and improvements should come from many eyes — especially for ingredient wording that changes by country, brand, and language.
The core problem
Ingredient lists are messy: multiple languages, E-numbers, abbreviations, and vague terms like “natural flavour.” A single missed haram ingredient is worse than a false alarm, so we built around something auditable and offline-capable first.
There is also no complete product database for halal checking. Open Food Facts and our Supabase products cache help, but coverage is patchy by region and brand — many barcodes return no ingredients, stale text, or nothing at all. You cannot “solve halal” with a single downloaded DB; you have to fill gaps continuously and treat every automatic verdict as only as good as the ingredient string behind it.
That is why HalalScan is not only a rules engine on top of OFF. Our practical stack around missing data looks like this:
| Gap | What we built |
|---|---|
| No ingredients on file | OCR on label photos — admins extract text from packaging images, split it into ingredient chips, and attach them to a product (web admin today; same idea for user-submitted label photos via contributions) |
| Crowdsourced fixes must not go live unchecked | Approval workflows — pending keyword suggestions, product corrections, ingredient contributions (from the app), and reports are reviewed before they affect shared data |
| Rules alone do not settle nuance | Forum — product-wide and per-ingredient discussions, often tied to an ingredient challenge from the app, so edge cases are argued in the open instead of silently “fixed” in code |
| Contributors need shared guardrails | Guidelines — what belongs in haram vs suspicious, when to suggest a keyword vs open a discussion, and that scans are ingredient analysis not certification (mirrored in copy on suggest/report flows and admin moderation) |
Together: barcode when you can, OCR and community when you cannot, humans in the loop before shared data changes.
AI is not part of the live verdict path. Edge Functions and prompts exist in the repo as groundwork, but production scans use the rules engine (plus Open Food Facts and cached products). That is an intentional product decision, not a missing feature we forgot to ship.
High-level architecture (today)
┌─────────────────┐ ┌──────────────────────────┐ ┌─────────────────┐
│ Flutter app │────▶│ Supabase │◀────│ halal-checker- │
│ (on-device) │ │ DB, Storage, community │ │ web (Next.js) │
└────────┬────────┘ └────────────┬─────────────┘ └────────┬────────┘
│ │ │
│ HalalRulesEngine (Dart) │ products, keywords, │ halal-rules-engine.ts
│ ingredient_keywords.dart │ discussions, reports │ + Rule Engine tester
└───────────────────────────┴────────────────────────────┘
Planned (not enabled): lookup-product / deep-analyze Edge Functions + Claude
Flutter owns canonical rules and runs matching on device. Supabase caches products and hosts community keywords, discussions, and moderation. This web project exposes the database, a Rule Engine tester, and admin flows for keywords and rule uploads.
How a verdict is decided today
Layer 1 — Rules engine (primary)
HalalRulesEngine in lib/services/halal_rules_engine.dart matches ingredient text against lists in lib/constants/ingredient_keywords.dart:
| List | Effect |
|---|---|
| Haram | Product is not halal (alcohol, pork, gelatin, carmine, selected E-numbers, …) |
| Suspicious | No hard haram call; user should verify source (whey, rennet, E471, natural flavour, …) |
Each canonical keyword has variants (spellings and languages) in haramVariants / suspiciousVariants. Matching uses Unicode-aware word boundaries so, for example, porcelain does not match pork.
Special cases in code:
- Fatty alcohols (cetyl, stearyl, lanolin, …) are excluded from the drinking-alcohol rule.
- Negation — “alcohol-free”, “sans alcool”, “alkoholfrei”, and similar phrases are not flagged as haram alcohol.
- Phrase variants use substring matching; single-word variants use boundaries (phrases are easier to over-match — see limitations below).
Layer 2 — Community keywords
Approved rows in Supabase keywords (from moderated keyword_suggestions) merge into the same matching logic as built-in rules, so coverage can grow without an app store release.
Layer 3 — Product data
Verdicts only matter if we have ingredients. The app loads from cache, Supabase products, or Open Food Facts. Bad or missing ingredient data limits any engine — rules included.
Built-in rules vs custom keywords
| Use custom keyword (Supabase) when… | Change built-in rules (Dart) when… |
|---|---|
| Narrow addition, clear wording | Safety-critical; must work offline |
| Same matching logic is enough | Matching logic or exceptions need code |
| Came from community feedback | Multilingual variants or new exception type |
Built-in rules export to keyword-rules.json (Flutter CI → Supabase Storage). The web tester fetches that file at runtime, with lib/rules.json as fallback.
Dual engine: Dart + TypeScript
- Source of truth:
ingredient_keywords.dartin the Flutter repo. - Web port:
lib/halal-rules-engine.tsin halal-checker-web. - Sync checks: shared
test/fixtures/engine_cases.jsonin CI;npm run sync:checkdiffs exported rule JSON.
Two implementations can drift in matching logic even when keyword data matches. Fixture tests exist to catch that.
Product lookup pipeline (Flutter)
1. Test DB (debug only) → instant fixtures
2. SharedPreferences cache → 30-day TTL
3. Supabase product cache → shared DB + rules engine on ingredients
4. Open Food Facts direct → fetch + rules engine (no AI)
ProductService orchestrates this; CacheService and SQLite scan history keep the UX usable offline.
Community (separate from automatic verdict)
Discussions, ingredient challenges, and wrong-verdict reports live in Supabase and on this site. They do not change the automatic scan result unless a human moderation or scholar workflow acts on them. That separation matters: the app’s default label is machine-assisted ingredient checking, not a fatwa.
Scaffolding for Deep Analysis (per-ingredient AI cards, product_analyses, deep-analyze-product) exists but is not relied on while AI remains disabled.
Planned next layer: AI — is it a good idea?
We are often asked whether HalalScan should “just use AI.” Short answer: maybe, but not as the judge — and not a general chat model without guardrails.
Why AI is off for now
| Concern | What it means for halal scanning |
|---|---|
| Auditability | Users and contributors need to see which rule fired. LLM outputs are harder to diff, test, and explain in court-of-public-opinion disputes. |
| False negatives | Missing one haram synonym is unacceptable. Models optimize for plausibility, not worst-case safety. |
| False positives | Over-flagging erodes trust and punishes brands unfairly. |
| Religious nuance | Madhhab differences, “doubtful” vs haram, and certification vs ingredients are not solved by scale alone. |
| Over-trust | A confident “Halal ✓” from an app logo feels like a religious endorsement. We want copy and UX that stay humble. |
| Ops | Latency, cost, API keys, rate limits, and vendor lock-in — fine for optional features, risky as the only path. |
The rules engine is boring on purpose: same input → same output, covered by tests, readable in a PR.
If we add AI later, what role should it play?
We are unlikely to enable “AI decides halal” as layer 1. More realistic roles:
- Parsing helper — Turn messy OCR or unstructured ingredient blobs into a clean token list for the rules engine (AI suggests, rules decide).
- Explanation helper — Plain-language “why suspicious” text that always links back to matched keywords or “no rule matched.”
- Discovery helper — Propose new variants or keywords for human approval (already how
keyword_suggestionsworks). - Deep dive (optional) — Long-form per-ingredient notes with citations, clearly labeled supplementary, never overriding a haram keyword hit.
Non-negotiable if AI ships: known haram terms from the rules engine always win. AI cannot clear a product that matched a hard haram rule.
General LLM vs a small, purpose-built model
| Approach | Pros | Cons |
|---|---|---|
| General LLM (e.g. Claude via Edge Function) | Fast to prototype; good at language and explanations; code paths already sketched in repo | Hard to regression-test; may invent ingredients or rulings; costly at scale; “trust” is branding, not proof |
| Small specialized model (classifier / NER: haram, suspicious, or unknown) | Cheaper inference; fixed output schema; easier to benchmark on a golden dataset | Needs curated training data and ongoing maintenance; still wrong on edge cases; does not replace scholarly judgment |
| Rules only (current) | Transparent, offline, community-extensible | Misses novel spellings until someone adds a variant; weak on unstructured text |
Our bias: stay rules-first. If we invest in ML, prefer a narrow model (ingredient tagging, language detection, variant suggestion) over an open-ended “is this halal?” prompt. A mini model trained only on food-ingredient halal labels might be more testable than GPT-style answers — but only if we treat its output like another input to the engine, not the final verdict.
Would users trust it?
Trust comes from transparency, not model size:
- Show every match: canonical keyword, reason, variant that hit (tester).
- Distinguish “rule matched” vs “AI suggestion (unverified).”
- Keep suggest and report loops so mistakes get fixed in data, not in prompt tweaking alone.
- Never imply certification; ingredient analysis ≠ halal logo on the package.
For many Muslims, an opaque model is less trustworthy than a published keyword list they can argue with. We optimize for the second.
Practical roadmap (draft)
- Now — Harden rules, variants, community keywords, OFF data quality, web/app parity.
- Next — Optional AI behind a flag: parsing + explanations only; keyword override mandatory.
- Later — Evaluate a small classifier on a fixed dataset; compare against fixtures before any user-facing verdict influence.
- Always — Scholar/community paths for disputes; AI does not close threads.
What we got right (so far)
- Auditable rules — User-visible reasons; the transparency tester shows every match.
- Offline-first safety — Built-in lists work without network or API keys.
- Open data path — Community keywords, reports, rule JSON in Storage.
- Multilingual variants — One canonical key, many surface forms (10 languages on the web tester).
- Honest scope — We flag ingredients; we do not certify brands.
Known limitations
- Phrase matching is broader than word matching — Multi-word variants use
includes(); overly generic phrases can false-positive. Add tests when fixing. - “Suspicious” is not “halal” — Users must still verify source (whey, emulsifiers, etc.).
- Ingredient data quality — Open Food Facts varies by region. Wrong or missing lists → wrong verdicts, regardless of engine.
- Two engines, one truth — Dart rule changes without uploading
keyword-rules.json(or /admin/rules) leave the web tester stale. - Religious nuance — Automate
suspiciouswhere scholars disagree; avoid hardharamunless widely agreed or clearly defined (e.g. pork, alcohol as beverage). - Certification vs ingredients — A clean list does not replace a trusted halal certification on processed foods.
Where to look in the repo
| Area | Flutter path |
|---|---|
| Keyword lists & variants | lib/constants/ingredient_keywords.dart |
| Engine logic | lib/services/halal_rules_engine.dart |
| Product pipeline | lib/services/product_service.dart |
| Custom keywords | lib/services/keyword_service.dart |
| UI catalog / suggest | lib/screens/keywords_screen.dart |
On the web: lib/halal-rules-engine.ts, app/transparency/, app/admin/keywords/, app/admin/rules/.
Contributing
- False positive? Failing test in
test/services/keyword_analysis_test.dart, then narrow the rule or add an exception. - False negative? Variants or /suggest; safety-critical terms should land in built-in rules eventually.
- Engine change? Update Dart and TypeScript, refresh fixtures, export rules JSON.
Questions and PRs welcome on GitHub. If you have opinions on the AI layer — especially dataset design or evaluation — open a discussion; we would rather design it in public than flip it on quietly.