# kcolbchain PR / issue pipeline

Agentic routing layer for the kcolbchain GitHub org. Runs on the VPS via cron. Pure-stdlib Python + small Anthropic API calls. **Designed so Claude Code sessions don't re-discover it** — this README is the single source of truth.

## What this exists to solve

Every day contributors open PRs and issues across ~20 public kcolbchain repos. Manually reviewing every one bloats Claude Code context and wastes credits. This pipeline:

1. Triages every new PR/issue within 30 min (labels, welcomes first-timers).
2. Runs a small Claude Haiku audit on external PRs above a size threshold.
3. Auto-merges PRs that pass both the Haiku audit and GitHub's mergeability check.
4. Closes stale, stalled, duplicate work with a template comment.
5. Pushes a daily digest of org activity to Telegram.

Everything else (human review, architecture discussion, real feature PRs from `abhicris`) still flows through Claude Code. The pipeline just filters the noise.

## Tiers

| Tier | Runs | Uses Claude? | Purpose |
|------|------|--------------|---------|
| 1. Triage | every 30 min | no | Label by path, welcome first-timers, assign maintainer |
| 2. Audit | every 2 h | Haiku 4.5 | Security + quality verdict on external PRs |
| 3. Merge | every 2 h (after audit) | no | Merge audited + CLEAN PRs, post thank-you |
| 4. Stale | daily | no | Close PRs/issues with no activity > 21 days |
| 5. Digest | daily | no | Activity summary → Telegram |
| 6. Community | daily | no | Rebuild `data/contributors.json`, `data/open-invitations.json` |

## Files

| File | Role |
|------|------|
| `lib_gh.py` | Minimal GitHub REST v3 client (urllib, paginates, no deps) |
| `lib_claude.py` | Minimal Anthropic API client (urllib, JSON-mode, no deps) |
| `triage.py` | Tier 1 — runs on every scheduled tick |
| `audit.py` | Tier 2 — Haiku audit; writes `pipeline-verdict: {safe,needs_human,reject}` label |
| `merge_safe.py` | Tier 3 — merges PRs labelled `pipeline-verdict: safe` + state CLEAN |
| `stale.py` | Tier 4 — closes stale issues and PRs |
| `digest.py` | Tier 5 — extends `daily-report.sh` with PR/issue/contributor counts |
| `contributors.py` | Tier 6 — rebuilds `data/contributors.json` + `contributor-stats.json` |
| `invitations.py` | Tier 6 — rebuilds `data/open-invitations.json` (help-wanted / good-first-issue) |
| `run.sh` | Dispatcher; cron points here |

## Environment (.env on VPS, never commit)

```
GH_TOKEN=<fine-grained, org:kcolbchain, repo:write + issues:write>
ANTHROPIC_API_KEY=<sk-ant-...>
TELEGRAM_BOT_TOKEN=<existing>
TELEGRAM_CHAT_ID=<existing>
PIPELINE_DRY_RUN=0        # set to 1 to disable all mutations
PIPELINE_MAX_DIFF_KB=64   # skip audit for larger diffs (too expensive)
```

## Routing rules (PRs)

```
author in {abhicris, dependabot[bot]}          → skip audit, label "trusted", merge if CLEAN + tests pass
author in {known contributors}                 → Tier 2 audit; auto-merge on verdict=safe
first-timer (≤ 1 prior PR in org)              → Tier 1 "first-timer-welcome" comment, then Tier 2 audit
diff size > PIPELINE_MAX_DIFF_KB               → label "needs-human-review", skip Tier 2
diff touches .github/workflows/** or setup.py  → label "needs-human-review", skip Tier 2 (supply chain)
docs-only (*.md, README*, docs/**)             → Tier 2 skipped; auto-merge with label "docs-only"
```

## Routing rules (issues)

```
title matches /bug|crash|error/i               → label "bug"
title matches /feature|enhancement|add/i       → label "enhancement"
body mentions specific file/module             → @-mention last git-blame author (future)
no activity 21d + no label "pinned"            → close with stale template
```

## Labels the pipeline owns

- `pipeline-verdict: safe` — Tier 2 passed
- `pipeline-verdict: needs-human` — Tier 2 flagged human review
- `pipeline-verdict: reject` — Tier 2 found a red flag
- `pipeline: trusted-author` — abhicris / dependabot / kcolbchain org members
- `pipeline: docs-only` — only *.md changes
- `pipeline: first-timer-welcome` — comment posted
- `pipeline: stale` — flagged for closure

## How Claude Code should interact with this

**You (the Claude Code session) do NOT run this pipeline.** It runs on the VPS on cron. When a user asks about PR/issue state:

1. Check `/srv/kcolbchain/main/reports/pipeline/YYYY-MM-DD.json` for the last digest.
2. If you need the latest state, run `gh` directly — do not re-implement what's here.
3. If you need to change pipeline behaviour, edit this dir and commit. `deploy.sh` + cron pick it up.

## Cost envelope (Haiku 4.5)

~5k input + 500 output per audit ≈ $0.004. At 50 external PRs/week ≈ $0.20/week. Hard cap via `PIPELINE_MAX_DIFF_KB`.

## Cron entries

```
# PR/issue triage (Tier 1)
*/30 * * * *  bash /srv/kcolbchain/main/deploy/pr-pipeline/run.sh triage  >> /srv/kcolbchain/main/deploy/pr-pipeline.log 2>&1

# Claude audit (Tier 2) + merge (Tier 3) — runs sequentially
15 */2 * * *  bash /srv/kcolbchain/main/deploy/pr-pipeline/run.sh audit  >> /srv/kcolbchain/main/deploy/pr-pipeline.log 2>&1
30 */2 * * *  bash /srv/kcolbchain/main/deploy/pr-pipeline/run.sh merge  >> /srv/kcolbchain/main/deploy/pr-pipeline.log 2>&1

# Stale + digest (daily)
0 3 * * *     bash /srv/kcolbchain/main/deploy/pr-pipeline/run.sh stale  >> /srv/kcolbchain/main/deploy/pr-pipeline.log 2>&1
15 3 * * *    bash /srv/kcolbchain/main/deploy/pr-pipeline/run.sh digest >> /srv/kcolbchain/main/deploy/pr-pipeline.log 2>&1

# Community ledger (daily, after digest)
30 3 * * *    bash /srv/kcolbchain/main/deploy/pr-pipeline/run.sh community >> /srv/kcolbchain/main/deploy/pr-pipeline.log 2>&1
```
