Section 7 — Data Schema
Platform Data Structure
Every report metric maps to a specific field · segmentation derived from units[] array
projects — master record
{
"project_id": "uuid-v4",
"name": "Origin Plug & Play",
"developer": "Origin Property Public Co.",
"district": "Bangtao",
"sub_district": "Cherngtalay",
"product_type": "condo",
"freehold_available": true,
"units_total": 288,
"floors": 8,
"buildings": 3,
"status": "selling",
"sales_launch_date": "2024-01-15",
"completion_date": "2026-09-01",
"has_sales_kit": true,
"has_contacts": true,
"last_price_date": "2026-03-05",
"data_quality_score": 87,
"trust_index": 89
}
source_files — file journal (physical copy guard)
{
"id": "uuid-v4",
"project_id": "ref → projects",
"received_at": "2026-03-05T09:12:00Z",
"file_date": "2026-03-04",
"gdrive_url_original": "developer's folder",
"gdrive_url_copy": "/incoming/origin_pp_20260305.pdf",
"file_format": "pdf",
"source_channel": "whatsapp",
"parse_status": "ok",
"changed_vs_previous": true,
"confidence_avg": 0.94
}
price_snapshots — immutable record per update
{
"snapshot_id": "uuid-v4",
"project_id": "ref → projects",
"snapshot_date": "2026-03-05",
"source_file_id": "ref → source_files",
"units_total": 288,
"units_available": 41,
"units_sold": 247,
"units_reserved": 0,
"psm_min": 88000,
"psm_max": 142000,
"psm_median": 103000,
"psm_delta_vs_prev": 500,
"band_low_count": 23,
"band_mid_count": 181,
"band_high_count": 63,
"band_lux_count": 21,
"segment_assigned": "mid",
"segment_changed": false,
"confidence_score": 0.91,
"cancellation_detected": false,
"promo_active": false
}
units — per-unit row (separate table, not JSONB)
{
"unit_ref": "A-301",
"unit_type": "1BR",
"area_sqm": 68.5,
"land_sqm": null,
"floor": 3,
"view": "pool",
"ownership_type": "leasehold",
"price_thb": 9800000,
"price_list_thb": 10200000,
"price_promo_thb": null,
"price_per_sqm": 143066,
"status": "available"
}
Derived fields — computed by Agent A4
segment = band with count ≥ ceil(units_total × 0.51)
→ if no band qualifies: segment = "mixed", shown in all applicable
absorption_rate_monthly =
(units_sold / units_total) / months_since_launch × 100
psm_delta_weekly =
(snapshot_curr.psm_median − snapshot_prev.psm_median) / snapshot_prev.psm_median × 100
segment_crossing_flag =
snapshot_curr.segment_assigned ≠ snapshot_prev.segment_assigned → anomaly: "segment_reclassified"
data_quality_score =
has_price_list_fresh_14d(+25) + has_unit_breakdown(+20)
+ has_launch_date(+15) + developer_verified(+15)
+ units_total_known(+10) + has_contacts(+10) + has_completion_date(+5)
trust_index =
llm_provider = "Claude API"
5 tables in PostgreSQL: projects · source_files · price_snapshots · units · contacts. Segment assigned per snapshot — so segment history is preserved. source_files stores both the original developer link and our physical Drive copy — data never lost if developer deletes their folder. Existing Sales Kits → parse into price_snapshots with snapshot_date = file date.
Telegram Bot — Command Reference
/report
/status
/run [agent]
/new_project [url]
/passport [name|id]
/passport_compare [p1] [p2]
/passport_list [district]
Implementation phases: Фаза 0.5 (Excel cleanup → Supabase import · 3–5 days) → Фаза 0 (infra: Python · Supabase 5 tables · Sheets API · Telegram skeleton · Claude API · 1–2 wks) → Фаза 1 (A1 scraper + A1b · 3–4 wks) → Фаза 2 (A3 parser + A2b link extractor · 3–4 wks) → Фаза 3 (A4 analytics + first passports + A5 · 2–3 wks) → Фаза 4 (A2 communicator · 2–3 wks) → Фаза 5 (weekly cadence · alerts · /passport · 2–3 wks). Total: ~3–4 months to production.