Live Orchestrator View

Colin

Colin is a Go service that watches a Linear project, runs Codex in per-issue workspaces, and hands off review-ready changes. View the GitHub repository.

Shell Render
2026-03-30T21:15:55.132169Z

Current task surface

HTMX keeps this fragment fresh without reloading the full page shell.

Live data2026-03-30T21:15:55Z
Running
1
active issue workspaces
Retrying
0
queued follow-up attempts
Total Tokens
0
aggregate across active runs
Run Seconds
1150
combined wall clock

Linear issues

Tracked Linear issues in the active handoff pipeline.

Backlog
0
Issue is parked outside the active handoff states.
Todo
0
Issue is ready for Colin to pick up.
In Progress
1
Issue is actively being worked.
Refine
0
Issue needs human clarification before a PR can be reviewed.
Review
0
Issue has a PR and is awaiting human review.
Merge
2
Issue is approved and waiting to be merged.

Running tasks

Each worker card shows its live state, token usage, and an expandable Codex event stream.

COLIN-128codex PR review labels
In Progress

Session

Session: 019d4090-2af3-7142-825f-03f67bfb47b0-019d4094-decd-7251-9b47-7f1da8f1edfb
Turns: 4
Started: 2026-03-30T21:04:41Z

Activity

no message
Last event at: 2026-03-30T21:15:28Z

Usage

Turns: 4
Input: 0
Output: 0
Total: 0
Codex output
21:15:20 UTCother_message
The service end-to-end test regressed. I’m running that test alone with verbose output so I can see whether the new startup label ensures or the poll-loop sync changed timing or introduced a hidden startup failure.
21:15:07 UTCother_message
The focused suite is still running. I’m polling for the remaining package results now; if anything else fails, I’ll fix it before widening to `go test ./...`.
21:14:51 UTCother_message
The first test run surfaced one bad assumption: the “approval pending” helper lives in the Codex runner, not in `repoops`. I’m inlining the comparison logic into the new repo review-state helper so the dependency direction stays clean.
21:14:41 UTCother_message
`gofmt` only failed because I included `README.md`. I’m rerunning it on the Go files only, then I’ll execute the focused test set.
21:14:36 UTCother_message
I’m at the verification phase. I’ve got the core logic and tests in place, and I’m formatting before running the targeted Go suites so any failures point to behavior rather than syntax noise.
21:14:23 UTCother_message
I’ve covered the new logic with unit tests. I’m doing the docs update now, then I’ll format and run the focused Go test set to catch any interface drift or missed imports.
21:12:48 UTCother_message
I’m adding the unit tests around the new removal path first. That gives me a tight feedback loop before I add broader sync behavior on top of it.
21:12:25 UTCother_message
The next pass is tests and the fake Linear server. I need those in place before I can safely refactor sync behavior, because the current fake server can’t actually show label convergence yet.
21:11:37 UTCother_message
I’m adding the merge cleanup in the narrowest place: `buildMergedResult`, after the issue reaches its post-merge state but before metadata is persisted. That keeps the cleanup local to successful merge completion.
21:11:12 UTCother_message
I’m wiring the poll-loop sync now. The intent is best-effort convergence: use the existing tracked-issue fetch, classify Codex review state only when there is a PR signal, and aggressively clear stale managed labels when there isn’t.
21:10:33 UTCother_message
I’m editing the Linear client now. The main change here is a no-op-safe removal path so later sync code can converge labels instead of only appending more state.
21:10:18 UTCother_message
I’ve got the integration points. I’m editing the domain, tracker, and repo review-state helpers first, then I’ll wire orchestration and merge cleanup around them and finish with the fake server/test updates.
21:10:16 UTCother_message
I’ve narrowed the edit set. Next I’m checking the merge-success helper and existing runner/orchestrator tests so I can add the smallest cleanup hook and keep the stubs in sync with the tracker interface change.
21:10:08 UTCother_message
I’ve confirmed the tree is clean. I’m reading the exact call sites now so I can thread removal and sync through the existing startup, poll loop, and merge-success paths without widening the change.
21:10:00 UTCother_message
I’m rechecking the worktree and the relevant files before editing so I don’t clobber any changes made since the last turn, then I’ll implement the label lifecycle end to end.
21:09:51 UTCother_message
You are working on Linear issue COLIN-128: codex PR review labels.

Repository rules:
- Follow `AGENTS.md`.
- Make changes only in this repository workspace.
- Prefer the smallest correct change that resolves the issue.
- Run relevant Go tests before you finish.

Issue context:
- State: In Progress
- URL: https://linear.app/bothnia/issue/COLIN-128/codex-pr-review-labels
Issue description:

we should mirror the github PR review status by codex using linear issue labels, so it is possible to see what state an issue is in when I look at the linear board. right now I can't see if an issue is undergoing review, if the review has finished, or if there are unresolved feedback - surface this as labels, and verify if the labels exist when starting (and create the missing ones)

Definition of done:
- Implement the requested change.
- Add or update tests when behavior changes.
- Leave the repo in a clean, reviewable state ready for `Review`.
- Do not merge changes yourself during coding turns; Colin will publish in `Review` and merge in `Merge`.
- Summarize what changed, what was tested, and any remaining risk.

Output contract:
- If the issue is still too underspecified to implement safely, begin your final response with `COLIN_OUTCOME: NEEDS_SPEC`.
- After `COLIN_OUTCOME: NEEDS_SPEC`, explain what information is missing and include the exact sentence `The spec should be improved before implementation.`
- If the issue is implementable, begin your final response with `COLIN_OUTCOME: READY_FOR_REVIEW`.
- `Review` is PR-only. Clarification-only handoffs go to `Refine`.

ExecPlan:

I’m gathering the exact extension points and existing tests so the plan can tell a novice precisely where to edit, what behavior to add, and how to prove it works.

I’ve confirmed the repository already recognizes Codex review signals from GitHub reactions, which makes label mirroring feasible, but there is still no general label-removal path. I’m checking the tracker test scaffolding next because the plan needs to call out every test fixture that must evolve.

I have the likely implementation shape now. I’m checking the Linear client’s label-creation semantics so the plan can state whether startup should ensure labels globally or per-team and whether the existing helper is enough.

The remaining gap is the actual Codex review lifecycle inside the runner, because the new labels need to mirror that lifecycle rather than generic PR state. I’m reading those branches now so the plan can define an exact label state machine.

I’ve pinned down the three Codex review states already present in behavior: review requested via `EYES`, review approved via `THUMBS_UP`, and unresolved Codex review threads. That is enough to define concrete Linear labels instead of inventing a new workflow.

I have enough to write the plan. The main design choice is to make the labels a single managed state machine with explicit precedence and stale-label cleanup, because startup-only creation without periodic syncing would leave the Linear board wrong as GitHub review status changes.

One last detail matters for the plan: the fake Linear server currently knows label definitions but does not model per-issue label add/remove state beyond the static issue payload. I’m checking that so the ExecPlan can spell out the fixture changes instead of hand-waving them.

# Mirror Codex PR Review Status Into Linear Labels

This ExecPlan is a living document. The sections `Progress`, `Surprises & Discoveries`, `Decision Log`, and `Outcomes & Retrospective` must be kept up to date as work proceeds.

## Purpose / Big Picture

After this change, a person looking at the Linear board will be able to tell, from issue labels alone, whether Codex review for the linked GitHub pull request is still in progress, has finished successfully, or still has unresolved Codex feedback. Colin will also create any missing review-status labels at startup so the sync does not silently fail on a new workspace or a newly configured Linear team.

The visible result is simple. For tracked issues with a linked pull request, Colin will keep exactly one of these labels in sync with GitHub Codex review status: `codex-review: pending`, `codex-review: approved`, or `codex-review: unresolved-feedback`. If there is no current Codex review status to show, Colin will remove those managed labels so stale board state does not linger.

## Progress

- [ ] Add managed Codex review label constants and helper functions in the shared domain layer.
- [ ] Extend the tracker interface and Linear client with safe label removal support.
- [ ] Generalize service startup label initialization so Colin ensures `paused` plus all managed Codex review labels exist.
- [ ] Add a single, explicit Codex review state machine based on existing GitHub review signals.
- [ ] Add periodic best-effort label syncing for tracked issues that already have a GitHub pull request.
- [ ] Clear managed Codex review labels when a pull request is merged and the issue leaves review-related states.
- [ ] Update fake tracker servers and test doubles so add/remove label mutations change per-issue label state.
- [ ] Add focused unit tests for label creation, label removal, review-state mapping, and label sync precedence.
- [ ] Add or extend service-level fake end-to-end coverage to prove label sync behavior is observable.
- [ ] Update `README.md` so operators know which labels Colin manages and what each label means.

## Surprises & Discoveries

* Observation: Colin already has the GitHub-side data needed for this feature. `internal/repoops/manager.go` already computes three Codex review signals inside `ReviewContext`: unresolved Codex review threads, the latest `EYES` reaction from `chatgpt-codex-connector[bot]`, and the latest `THUMBS_UP` reaction from the same bot.  
  Evidence: `ReviewContext` already exposes `CodexReviewThreads`, `CodexReviewRequestedAt`, and `CodexReviewApprovedAt`, and `internal/agent/codex/runner.go` already uses those fields to block merge when Codex review is still pending.
* Observation: Colin can create and add Linear labels today, but it cannot remove them. Without removal support, any review label introduced by this feature would become stale as soon as GitHub state changes.  
  Evidence: `internal/tracker/tracker.go` only exposes `EnsureIssueLabel` and `AddIssueLabel`, and `internal/tracker/linear/client.go` implements only `issueAddLabel`-style mutations.
* Observation: startup currently ensures only the `paused` label exists. The new labels must be added to the same startup path or the first sync pass will depend on manual Linear setup.  
  Evidence: `internal/service/service.go` has a dedicated `ensurePausedLabel` helper called from `Service.Run`.
* Observation: the fake Linear server used by service tests does not currently persist per-issue label changes; it returns a static `e2e` label on every issue snapshot.  
  Evidence: `internal/service/e2e_test.go` stores label definitions in `fakeLinearServer.labels`, but `issueNode` always returns `labels.nodes = [{"name":"e2e"}]`.

## Decision Log

* Decision: Manage exactly three mutually exclusive Linear labels: `codex-review: pending`, `codex-review: approved`, and `codex-review: unresolved-feedback`.  
  Rationale: The issue asks for three user-visible states. A single managed state dimension is easier to understand on the Linear board than a combinatorial set of overlapping labels.  
  Date/Author: 2026-03-30 / Codex
* Decision: Give `unresolved-feedback` the highest precedence, then `pending`, then `approved`, and otherwise show no managed Codex review label.  
  Rationale: unresolved feedback is the most actionable state; if Codex review threads are still open, the board should say that even if an earlier reaction history also contains `EYES` or `THUMBS_UP`.  
  Date/Author: 2026-03-30 / Codex
* Decision: Sync review labels best-effort during the orchestrator poll loop for tracked non-terminal states, instead of only when a runner happens to touch the issue.  
  Rationale: the user wants board visibility. That requires the labels to update while an issue sits idle in `Review` or `Merge`, not only when another run starts.  
  Date/Author: 2026-03-30 / Codex
* Decision: Clear all managed Codex review labels explicitly on successful merge, even if the periodic sync does not inspect terminal states.  
  Rationale: otherwise a merged issue could retain a stale review label forever after leaving Colin’s tracked handoff states.  
  Date/Author: 2026-03-30 / Codex

## Outcomes & Retrospective

Implementation has not started yet. A successful outcome will mean that a tracked issue with a linked pull request can move through these observable board states without manual label editing: pending review, approved review, unresolved feedback, and finally no managed review label after merge or when no GitHub Codex review signal exists.

When this plan is executed, update this section with the exact tests added, any deviations from the initial label taxonomy, and whether the final implementation kept sync entirely inside the orchestrator or needed a second sync hook elsewhere.

## Context and Orientation

Colin is a long-running Go service. It polls Linear, prepares per-issue workspaces, asks GitHub for pull request state, and runs Codex against issues in active Linear states. The code relevant to this feature is split across a few packages.

`internal/service/service.go` is the startup layer. It loads runtime dependencies and currently ensures the `paused` label exists before the orchestrator starts. This is the right place to ensure the new managed review labels exist too.

`internal/tracker/tracker.go` defines the tracker interface used by both the orchestrator and the runner. `internal/tracker/linear/client.go` is the Linear GraphQL implementation. This code already knows how to find a label by name, create a missing label, and add a label to an issue. It needs symmetrical removal support so Colin can remove stale review labels.

`internal/repoops/manager.go` is the GitHub adapter. Its `ReviewContext` type already exposes everything needed to classify Codex review state: unresolved Codex review threads, the latest Codex `EYES` reaction, and the latest Codex `THUMBS_UP` reaction.

`internal/orchestrator/orchestrator.go` owns the poll loop. It is the right place to perform best-effort periodic syncing from GitHub review state into Linear labels, because it already visits tracked issues continuously.

`internal/agent/codex/runner.go` handles publish and merge flows. Merge success is the one place where an issue can leave review-related states permanently, so this file is the right place for explicit terminal cleanup of managed review labels.

The main tests that must move with this feature are in `internal/tracker/linear/client_test.go`, `internal/repoops/manager_test.go`, `internal/orchestrator/orchestrator_test.go`, and `internal/service/e2e_test.go`. The last file is especially important because it uses fake Linear and fake GitHub behavior to prove Colin’s end-to-end orchestration.

## Plan of Work

### Milestone 1: Add label primitives and symmetric Linear label operations

Start by making label management a first-class concept. In `internal/domain/types.go`, add constants for the three managed Codex review labels and a helper that returns the full slice of managed review labels in a stable order. Keep the existing `PausedIssueLabel` constant; this change is additive.

Then extend `internal/tracker/tracker.go` with a `RemoveIssueLabel(ctx context.Context, issueID string, labelName string) error` method. Implement it in `internal/tracker/linear/client.go`. Do not make removal create missing labels. Removal should first look up the existing label ID by name, treat a missing label definition as a no-op, and only call the Linear mutation when the label exists. This avoids the nonsense case where a removal request creates a label just so it can delete it from one issue.

While editing `internal/service/service.go`, replace `ensurePausedLabel` with a more general helper that ensures `paused` plus all managed Codex review labels exist during service startup. Keep the startup behavior idempotent: rerunning Colin should only reuse existing labels, not duplicate them.

The independently verifiable result of this milestone is that the tracker can add and remove named labels safely, and service startup ensures the label definitions exist before sync logic runs. Prove this with focused tests in `internal/tracker/linear/client_test.go` and a service-level startup test or e2e assertion that the fake Linear server sees creation requests for missing managed labels.

### Milestone 2: Define one Codex review state machine and sync it during polling

Next, define one explicit state machine for Codex review status. Put it in a location both orchestration code and runner code can use without circular imports. A small helper near `internal/repoops/manager.go` is a good fit because `ReviewContext` already lives there. Add a type such as `CodexReviewState` with values for `none`, `pending`, `approved`, and `unresolved_feedback`, plus a helper that maps that state to the exact Linear label name.

The classification rules must be:

If `len(reviewContext.CodexReviewThreads) > 0`, the state is `unresolved_feedback`.

Otherwise, if there is a latest Codex `EYES` reaction and there is no later Codex `THUMBS_UP` reaction, the state is `pending`.

Otherwise, if there is a latest Codex `EYES` reaction and there is a later Codex `THUMBS_UP` reaction, the state is `approved`.

Otherwise, the state is `none`.

After that, add a new best-effort label sync pass to the orchestrator poll loop. The goal is not to block dispatch when GitHub or Linear blips. The goal is to keep the board reasonably fresh. The sync should inspect tracked non-terminal issues, skip issues with no pull request signal, call `Repo.ReviewContext` only for issues that already have an associated pull request, compute the desired managed label, and then make Linear match that desired state by adding the desired label and removing the other managed review labels. If the desired state is `none`, remove all managed review labels.

The simplest place to invoke this is from `internal/orchestrator/orchestrator.go` inside `handleTick`, near the existing state refresh path. Keep the sync isolated in a helper such as `syncCodexReviewLabels(ctx, issues []domain.Issue)` so it can be tested directly. If needed, refactor `refreshIssueStateCounts` to return the fetched issue slice so the same data can drive both board statistics and review-label syncing without a second Linear query.

The independently verifiable result of this milestone is that, given a domain issue and a fake `ReviewContext`, Colin computes one exact managed label and removes stale alternatives. Prove this with new orchestrator tests that cover all precedence cases: unresolved feedback wins over reactions, a later approval beats an older review request, and no signal clears all managed review labels.

### Milestone 3: Clean up labels on merge, update fakes, and document the behavior

Finally, make the feature durable and observable in the existing test harness. In `internal/agent/codex/runner.go`, add a best-effort cleanup call on successful merge that removes all managed Codex review labels before or immediately after the issue transitions out of merge-related states. Do not fail a successful merge if label cleanup fails; log a warning and continue.

Update `internal/service/e2e_test.go` so `fakeLinearServer` stores per-issue labels instead of returning a hard-coded `e2e` label forever. `issueAddLabel` must update the issue’s current label set, and the new `issueRemoveLabel` mutation must remove from that same set. This fake server is the only way to prove the full orchestration path without a real Linear workspace.

Then add at least one service-level scenario that shows the board-facing behavior. It does not need to hit real GitHub. Use the existing fake GitHub reaction and review thread files to simulate each state transition and assert that the fake Linear issue snapshot contains the expected managed label. Also update `README.md` in the “How Colin Works” section so operators know that Colin now ensures and manages Codex review status labels in Linear, what those labels mean, and that they are automatically removed when no longer applicable.

The independently verifiable result of this milestone is an end-to-end fake test that shows a Linear issue gaining and losing the correct managed review label based on GitHub Codex review signals, plus documentation that matches the implementation.

## Concrete Steps

Work from the repository root:

```
cd /Users/pme/src/pmenglund/colin/.colin/workspaces/COLIN-128
```

Edit the shared constants first.

```
apply_patch on internal/domain/types.go
```

Add tracker interface support next.

```
apply_patch on internal/tracker/tracker.go
apply_patch on internal/tracker/linear/client.go
apply_patch on internal/tracker/linear/client_test.go
```

Generalize startup label initialization.

```
apply_patch on internal/service/service.go
apply_patch on internal/service/service_test.go or internal/service/e2e_test.go
```

Add the Codex review state helper and orchestrator sync.

```
apply_patch on internal/repoops/manager.go or a new sibling file under internal/repoops/
apply_patch on internal/orchestrator/orchestrator.go
apply_patch on internal/orchestrator/orchestrator_test.go
```

Add terminal cleanup for merge success.

```
apply_patch on internal/agent/codex/runner.go
apply_patch on internal/agent/codex/runner_test.go if a focused unit test is cleaner than covering this only through service tests
```

Teach the fake Linear server how labels actually mutate.

```
apply_patch on internal/service/e2e_test.go
```

Update docs last, once behavior and exact label names are final.

```
apply_patch on README.md
```

Format and test after each milestone.

```
gofmt -w internal/domain/types.go internal/tracker/tracker.go internal/tracker/linear/client.go internal/tracker/linear/client_test.go internal/service/service.go internal/orchestrator/orchestrator.go internal/orchestrator/orchestrator_test.go internal/agent/codex/runner.go internal/service/e2e_test.go README.md
```

Run the focused tests first so failures are easier to interpret.

```
go test ./internal/tracker/linear ./internal/repoops ./internal/orchestrator ./internal/agent/codex ./internal/service
```

Then run the whole Go suite.

```
go test ./...
```

Expected success transcript after the implementation is complete will look like:

```
ok  	github.com/pmenglund/colin/internal/tracker/linear
ok  	github.com/pmenglund/colin/internal/repoops
ok  	github.com/pmenglund/colin/internal/orchestrator
ok  	github.com/pmenglund/colin/internal/agent/codex
ok  	github.com/pmenglund/colin/internal/service
ok  	github.com/pmenglund/colin/...
```

If one of the new label-sync tests fails, the most likely causes are wrong precedence, forgetting to remove stale labels, or leaving the fake Linear server’s issue snapshot static.

## Validation and Acceptance

Acceptance is behavioral, not structural.

First, startup must ensure all managed labels exist. A fresh fake Linear server with no labels should observe Colin looking up and then creating `paused`, `codex-review: pending`, `codex-review: approved`, and `codex-review: unresolved-feedback` exactly once each. Re-running the same startup path should reuse cached or existing labels and should not create duplicates.

Second, the orchestrator sync must keep exactly one managed Codex review label on an issue with a linked pull request. The expected mapping is:

* unresolved Codex review threads present: the issue has `codex-review: unresolved-feedback` and does not have the other two managed review labels
* no unresolved threads, latest `EYES` reaction newer than latest `THUMBS_UP` or approval absent: the issue has `codex-review: pending`
* no unresolved threads, latest `THUMBS_UP` newer than latest `EYES`: the issue has `codex-review: approved`
* no Codex review signal at all, or no linked pull request: the issue has none of the managed review labels

Third, successful merge must remove all managed Codex review labels before the issue leaves merge-related states. This must be true even if the issue previously had `codex-review: approved` or `codex-review: unresolved-feedback`.

The strongest proof should be a fake service-level scenario in `internal/service/e2e_test.go` that mutates fake GitHub reactions and fake GitHub review threads across polls and then asserts on the fake Linear issue’s label set. A new contributor should be able to read that test alone and understand the intended board behavior.

## Idempotence and Recovery

All label operations in this plan are safe to run repeatedly. Ensuring label definitions is idempotent because the Linear client first looks up a label by name and only creates it when missing. Syncing issue labels is also idempotent because the desired end state is “exactly one of these labels” or “none of these labels”, and each poll should converge toward that state.

If implementation goes halfway and tests fail, revert only the managed review label additions, not unrelated changes in the worktree. The safest rollback is to remove the new managed label constants and the new orchestrator sync call, leaving the existing `paused` label behavior intact. Do not use destructive git commands against unrelated user changes.

If you need to retry service-level tests, make sure the fake Linear server resets its in-memory label state between tests. Each test should use its own fresh fake server instance rather than depending on cross-test cleanup.

## Artifacts and Notes

Use short, focused assertions in tests. Representative examples that the finished implementation should make true are:

```
reviewContext := repoops.ReviewContext{
    CodexReviewRequestedAt: &requested,
}
if got := repoops.CodexReviewStateFromContext(reviewContext); got != repoops.CodexReviewStatePending {
    t.Fatalf("state = %q, want pending", got)
}

reviewContext := repoops.ReviewContext{
    CodexReviewRequestedAt: &requested,
    CodexReviewApprovedAt:  &approvedLater,
}
if got := repoops.CodexReviewStateFromContext(reviewContext); got != repoops.CodexReviewStateApproved {
    t.Fatalf("state = %q, want approved", got)
}

reviewContext := repoops.ReviewContext{
    CodexReviewRequestedAt: &requested,
    CodexReviewApprovedAt:  &approvedLater,
    CodexReviewThreads:     []domain.GitHubReviewThread{{ID: "thread-1"}},
}
if got := repoops.CodexReviewStateFromContext(reviewContext); got != repoops.CodexReviewStateUnresolvedFeedback {
    t.Fatalf("state = %q, want unresolved_feedback", got)
}
```

For the fake Linear server, the issue snapshot should stop being hard-coded. The label nodes returned from `issueNode` must come from mutable server state so tests can assert real convergence instead of only mutation calls.

## Interfaces and Dependencies

At the end of this work, these interfaces and helpers must exist in concrete form.

The tracker interface in `internal/tracker/tracker.go` must include:

```
type Client interface {
    FetchCandidateIssues(ctx context.Context) ([]domain.Issue, error)
    FetchIssuesByStates(ctx context.Context, stateNames []string) ([]domain.Issue, error)
    FetchIssueStatesByIDs(ctx context.Context, issueIDs []string) ([]domain.Issue, error)
    FetchIssueByID(ctx context.Context, issueID string) (domain.Issue, error)
    UpdateIssueState(ctx context.Context, issueID string, stateName string) error
    EnsureIssueLabel(ctx context.Context, labelName string) error
    AddIssueLabel(ctx context.Context, issueID string, labelName string) error
    RemoveIssueLabel(ctx context.Context, issueID string, labelName string) error
    ...
}
```

The shared domain layer in `internal/domain/types.go` must define stable label names and a helper returning the managed review label set.

The GitHub review helper must expose a pure classification API that only depends on existing `repoops.ReviewContext` fields. A shape like this is preferred:

```
type CodexReviewState string

const (
    CodexReviewStateNone               CodexReviewState = ""
    CodexReviewStatePending            CodexReviewState = "pending"
    CodexReviewStateApproved           CodexReviewState = "approved"
    CodexReviewStateUnresolvedFeedback CodexReviewState = "unresolved_feedback"
)

func CodexReviewStateFromContext(reviewContext ReviewContext) CodexReviewState
func LinearLabelForCodexReviewState(state CodexReviewState) string
```

The orchestrator must have one helper responsible for convergence of managed review labels. A shape like this is preferred:

```
func (o *Orchestrator) syncCodexReviewLabels(ctx context.Context, issues []domain.Issue)
```

That helper should not fail the entire poll tick when one issue cannot be synced. It should log and continue.

The runner merge path must have one helper that removes all managed review labels best-effort on merge success. Keep it localized so terminal cleanup does not leak into unrelated runner paths.

Revision note: Initial ExecPlan created on 2026-03-30 to define the implementation path before making any code changes. The plan records the chosen label taxonomy, precedence rules, and the need for periodic sync plus terminal cleanup so a future contributor can implement the feature without rediscovering those decisions.
21:09:51 UTCsession_started
21:09:51 UTCturn_completed
21:09:50 UTCother_message
# Mirror Codex PR Review Status Into Linear Labels

This ExecPlan is a living document. The sections `Progress`, `Surprises & Discoveries`, `Decision Log`, and `Outcomes & Retrospective` must be kept up to date as work proceeds.

## Purpose / Big Picture

After this change, a person looking at the Linear board will be able to tell, from issue labels alone, whether Codex review for the linked GitHub pull request is still in progress, has finished successfully, or still has unresolved Codex feedback. Colin will also create any missing review-status labels at startup so the sync does not silently fail on a new workspace or a newly configured Linear team.

The visible result is simple. For tracked issues with a linked pull request, Colin will keep exactly one of these labels in sync with GitHub Codex review status: `codex-review: pending`, `codex-review: approved`, or `codex-review: unresolved-feedback`. If there is no current Codex review status to show, Colin will remove those managed labels so stale board state does not linger.

## Progress

- [ ] Add managed Codex review label constants and helper functions in the shared domain layer.
- [ ] Extend the tracker interface and Linear client with safe label removal support.
- [ ] Generalize service startup label initialization so Colin ensures `paused` plus all managed Codex review labels exist.
- [ ] Add a single, explicit Codex review state machine based on existing GitHub review signals.
- [ ] Add periodic best-effort label syncing for tracked issues that already have a GitHub pull request.
- [ ] Clear managed Codex review labels when a pull request is merged and the issue leaves review-related states.
- [ ] Update fake tracker servers and test doubles so add/remove label mutations change per-issue label state.
- [ ] Add focused unit tests for label creation, label removal, review-state mapping, and label sync precedence.
- [ ] Add or extend service-level fake end-to-end coverage to prove label sync behavior is observable.
- [ ] Update `README.md` so operators know which labels Colin manages and what each label means.

## Surprises & Discoveries

- Observation: Colin already has the GitHub-side data needed for this feature. `internal/repoops/manager.go` already computes three Codex review signals inside `ReviewContext`: unresolved Codex review threads, the latest `EYES` reaction from `chatgpt-codex-connector[bot]`, and the latest `THUMBS_UP` reaction from the same bot.  
  Evidence: `ReviewContext` already exposes `CodexReviewThreads`, `CodexReviewRequestedAt`, and `CodexReviewApprovedAt`, and `internal/agent/codex/runner.go` already uses those fields to block merge when Codex review is still pending.

- Observation: Colin can create and add Linear labels today, but it cannot remove them. Without removal support, any review label introduced by this feature would become stale as soon as GitHub state changes.  
  Evidence: `internal/tracker/tracker.go` only exposes `EnsureIssueLabel` and `AddIssueLabel`, and `internal/tracker/linear/client.go` implements only `issueAddLabel`-style mutations.

- Observation: startup currently ensures only the `paused` label exists. The new labels must be added to the same startup path or the first sync pass will depend on manual Linear setup.  
  Evidence: `internal/service/service.go` has a dedicated `ensurePausedLabel` helper called from `Service.Run`.

- Observation: the fake Linear server used by service tests does not currently persist per-issue label changes; it returns a static `e2e` label on every issue snapshot.  
  Evidence: `internal/service/e2e_test.go` stores label definitions in `fakeLinearServer.labels`, but `issueNode` always returns `labels.nodes = [{"name":"e2e"}]`.

## Decision Log

- Decision: Manage exactly three mutually exclusive Linear labels: `codex-review: pending`, `codex-review: approved`, and `codex-review: unresolved-feedback`.  
  Rationale: The issue asks for three user-visible states. A single managed state dimension is easier to understand on the Linear board than a combinatorial set of overlapping labels.  
  Date/Author: 2026-03-30 / Codex

- Decision: Give `unresolved-feedback` the highest precedence, then `pending`, then `approved`, and otherwise show no managed Codex review label.  
  Rationale: unresolved feedback is the most actionable state; if Codex review threads are still open, the board should say that even if an earlier reaction history also contains `EYES` or `THUMBS_UP`.  
  Date/Author: 2026-03-30 / Codex

- Decision: Sync review labels best-effort during the orchestrator poll loop for tracked non-terminal states, instead of only when a runner happens to touch the issue.  
  Rationale: the user wants board visibility. That requires the labels to update while an issue sits idle in `Review` or `Merge`, not only when another run starts.  
  Date/Author: 2026-03-30 / Codex

- Decision: Clear all managed Codex review labels explicitly on successful merge, even if the periodic sync does not inspect terminal states.  
  Rationale: otherwise a merged issue could retain a stale review label forever after leaving Colin’s tracked handoff states.  
  Date/Author: 2026-03-30 / Codex

## Outcomes & Retrospective

Implementation has not started yet. A successful outcome will mean that a tracked issue with a linked pull request can move through these observable board states without manual label editing: pending review, approved review, unresolved feedback, and finally no managed review label after merge or when no GitHub Codex review signal exists.

When this plan is executed, update this section with the exact tests added, any deviations from the initial label taxonomy, and whether the final implementation kept sync entirely inside the orchestrator or needed a second sync hook elsewhere.

## Context and Orientation

Colin is a long-running Go service. It polls Linear, prepares per-issue workspaces, asks GitHub for pull request state, and runs Codex against issues in active Linear states. The code relevant to this feature is split across a few packages.

`internal/service/service.go` is the startup layer. It loads runtime dependencies and currently ensures the `paused` label exists before the orchestrator starts. This is the right place to ensure the new managed review labels exist too.

`internal/tracker/tracker.go` defines the tracker interface used by both the orchestrator and the runner. `internal/tracker/linear/client.go` is the Linear GraphQL implementation. This code already knows how to find a label by name, create a missing label, and add a label to an issue. It needs symmetrical removal support so Colin can remove stale review labels.

`internal/repoops/manager.go` is the GitHub adapter. Its `ReviewContext` type already exposes everything needed to classify Codex review state: unresolved Codex review threads, the latest Codex `EYES` reaction, and the latest Codex `THUMBS_UP` reaction.

`internal/orchestrator/orchestrator.go` owns the poll loop. It is the right place to perform best-effort periodic syncing from GitHub review state into Linear labels, because it already visits tracked issues continuously.

`internal/agent/codex/runner.go` handles publish and merge flows. Merge success is the one place where an issue can leave review-related states permanently, so this file is the right place for explicit terminal cleanup of managed review labels.

The main tests that must move with this feature are in `internal/tracker/linear/client_test.go`, `internal/repoops/manager_test.go`, `internal/orchestrator/orchestrator_test.go`, and `internal/service/e2e_test.go`. The last file is especially important because it uses fake Linear and fake GitHub behavior to prove Colin’s end-to-end orchestration.

## Plan of Work

### Milestone 1: Add label primitives and symmetric Linear label operations

Start by making label management a first-class concept. In `internal/domain/types.go`, add constants for the three managed Codex review labels and a helper that returns the full slice of managed review labels in a stable order. Keep the existing `PausedIssueLabel` constant; this change is additive.

Then extend `internal/tracker/tracker.go` with a `RemoveIssueLabel(ctx context.Context, issueID string, labelName string) error` method. Implement it in `internal/tracker/linear/client.go`. Do not make removal create missing labels. Removal should first look up the existing label ID by name, treat a missing label definition as a no-op, and only call the Linear mutation when the label exists. This avoids the nonsense case where a removal request creates a label just so it can delete it from one issue.

While editing `internal/service/service.go`, replace `ensurePausedLabel` with a more general helper that ensures `paused` plus all managed Codex review labels exist during service startup. Keep the startup behavior idempotent: rerunning Colin should only reuse existing labels, not duplicate them.

The independently verifiable result of this milestone is that the tracker can add and remove named labels safely, and service startup ensures the label definitions exist before sync logic runs. Prove this with focused tests in `internal/tracker/linear/client_test.go` and a service-level startup test or e2e assertion that the fake Linear server sees creation requests for missing managed labels.

### Milestone 2: Define one Codex review state machine and sync it during polling

Next, define one explicit state machine for Codex review status. Put it in a location both orchestration code and runner code can use without circular imports. A small helper near `internal/repoops/manager.go` is a good fit because `ReviewContext` already lives there. Add a type such as `CodexReviewState` with values for `none`, `pending`, `approved`, and `unresolved_feedback`, plus a helper that maps that state to the exact Linear label name.

The classification rules must be:

If `len(reviewContext.CodexReviewThreads) > 0`, the state is `unresolved_feedback`.

Otherwise, if there is a latest Codex `EYES` reaction and there is no later Codex `THUMBS_UP` reaction, the state is `pending`.

Otherwise, if there is a latest Codex `EYES` reaction and there is a later Codex `THUMBS_UP` reaction, the state is `approved`.

Otherwise, the state is `none`.

After that, add a new best-effort label sync pass to the orchestrator poll loop. The goal is not to block dispatch when GitHub or Linear blips. The goal is to keep the board reasonably fresh. The sync should inspect tracked non-terminal issues, skip issues with no pull request signal, call `Repo.ReviewContext` only for issues that already have an associated pull request, compute the desired managed label, and then make Linear match that desired state by adding the desired label and removing the other managed review labels. If the desired state is `none`, remove all managed review labels.

The simplest place to invoke this is from `internal/orchestrator/orchestrator.go` inside `handleTick`, near the existing state refresh path. Keep the sync isolated in a helper such as `syncCodexReviewLabels(ctx, issues []domain.Issue)` so it can be tested directly. If needed, refactor `refreshIssueStateCounts` to return the fetched issue slice so the same data can drive both board statistics and review-label syncing without a second Linear query.

The independently verifiable result of this milestone is that, given a domain issue and a fake `ReviewContext`, Colin computes one exact managed label and removes stale alternatives. Prove this with new orchestrator tests that cover all precedence cases: unresolved feedback wins over reactions, a later approval beats an older review request, and no signal clears all managed review labels.

### Milestone 3: Clean up labels on merge, update fakes, and document the behavior

Finally, make the feature durable and observable in the existing test harness. In `internal/agent/codex/runner.go`, add a best-effort cleanup call on successful merge that removes all managed Codex review labels before or immediately after the issue transitions out of merge-related states. Do not fail a successful merge if label cleanup fails; log a warning and continue.

Update `internal/service/e2e_test.go` so `fakeLinearServer` stores per-issue labels instead of returning a hard-coded `e2e` label forever. `issueAddLabel` must update the issue’s current label set, and the new `issueRemoveLabel` mutation must remove from that same set. This fake server is the only way to prove the full orchestration path without a real Linear workspace.

Then add at least one service-level scenario that shows the board-facing behavior. It does not need to hit real GitHub. Use the existing fake GitHub reaction and review thread files to simulate each state transition and assert that the fake Linear issue snapshot contains the expected managed label. Also update `README.md` in the “How Colin Works” section so operators know that Colin now ensures and manages Codex review status labels in Linear, what those labels mean, and that they are automatically removed when no longer applicable.

The independently verifiable result of this milestone is an end-to-end fake test that shows a Linear issue gaining and losing the correct managed review label based on GitHub Codex review signals, plus documentation that matches the implementation.

## Concrete Steps

Work from the repository root:

    cd /Users/pme/src/pmenglund/colin/.colin/workspaces/COLIN-128

Edit the shared constants first.

    apply_patch on internal/domain/types.go

Add tracker interface support next.

    apply_patch on internal/tracker/tracker.go
    apply_patch on internal/tracker/linear/client.go
    apply_patch on internal/tracker/linear/client_test.go

Generalize startup label initialization.

    apply_patch on internal/service/service.go
    apply_patch on internal/service/service_test.go or internal/service/e2e_test.go

Add the Codex review state helper and orchestrator sync.

    apply_patch on internal/repoops/manager.go or a new sibling file under internal/repoops/
    apply_patch on internal/orchestrator/orchestrator.go
    apply_patch on internal/orchestrator/orchestrator_test.go

Add terminal cleanup for merge success.

    apply_patch on internal/agent/codex/runner.go
    apply_patch on internal/agent/codex/runner_test.go if a focused unit test is cleaner than covering this only through service tests

Teach the fake Linear server how labels actually mutate.

    apply_patch on internal/service/e2e_test.go

Update docs last, once behavior and exact label names are final.

    apply_patch on README.md

Format and test after each milestone.

    gofmt -w internal/domain/types.go internal/tracker/tracker.go internal/tracker/linear/client.go internal/tracker/linear/client_test.go internal/service/service.go internal/orchestrator/orchestrator.go internal/orchestrator/orchestrator_test.go internal/agent/codex/runner.go internal/service/e2e_test.go README.md

Run the focused tests first so failures are easier to interpret.

    go test ./internal/tracker/linear ./internal/repoops ./internal/orchestrator ./internal/agent/codex ./internal/service

Then run the whole Go suite.

    go test ./...

Expected success transcript after the implementation is complete will look like:

    ok  	github.com/pmenglund/colin/internal/tracker/linear
    ok  	github.com/pmenglund/colin/internal/repoops
    ok  	github.com/pmenglund/colin/internal/orchestrator
    ok  	github.com/pmenglund/colin/internal/agent/codex
    ok  	github.com/pmenglund/colin/internal/service
    ok  	github.com/pmenglund/colin/...

If one of the new label-sync tests fails, the most likely causes are wrong precedence, forgetting to remove stale labels, or leaving the fake Linear server’s issue snapshot static.

## Validation and Acceptance

Acceptance is behavioral, not structural.

First, startup must ensure all managed labels exist. A fresh fake Linear server with no labels should observe Colin looking up and then creating `paused`, `codex-review: pending`, `codex-review: approved`, and `codex-review: unresolved-feedback` exactly once each. Re-running the same startup path should reuse cached or existing labels and should not create duplicates.

Second, the orchestrator sync must keep exactly one managed Codex review label on an issue with a linked pull request. The expected mapping is:

- unresolved Codex review threads present: the issue has `codex-review: unresolved-feedback` and does not have the other two managed review labels
- no unresolved threads, latest `EYES` reaction newer than latest `THUMBS_UP` or approval absent: the issue has `codex-review: pending`
- no unresolved threads, latest `THUMBS_UP` newer than latest `EYES`: the issue has `codex-review: approved`
- no Codex review signal at all, or no linked pull request: the issue has none of the managed review labels

Third, successful merge must remove all managed Codex review labels before the issue leaves merge-related states. This must be true even if the issue previously had `codex-review: approved` or `codex-review: unresolved-feedback`.

The strongest proof should be a fake service-level scenario in `internal/service/e2e_test.go` that mutates fake GitHub reactions and fake GitHub review threads across polls and then asserts on the fake Linear issue’s label set. A new contributor should be able to read that test alone and understand the intended board behavior.

## Idempotence and Recovery

All label operations in this plan are safe to run repeatedly. Ensuring label definitions is idempotent because the Linear client first looks up a label by name and only creates it when missing. Syncing issue labels is also idempotent because the desired end state is “exactly one of these labels” or “none of these labels”, and each poll should converge toward that state.

If implementation goes halfway and tests fail, revert only the managed review label additions, not unrelated changes in the worktree. The safest rollback is to remove the new managed label constants and the new orchestrator sync call, leaving the existing `paused` label behavior intact. Do not use destructive git commands against unrelated user changes.

If you need to retry service-level tests, make sure the fake Linear server resets its in-memory label state between tests. Each test should use its own fresh fake server instance rather than depending on cross-test cleanup.

## Artifacts and Notes

Use short, focused assertions in tests. Representative examples that the finished implementation should make true are:

    reviewContext := repoops.ReviewContext{
        CodexReviewRequestedAt: &requested,
    }
    if got := repoops.CodexReviewStateFromContext(reviewContext); got != repoops.CodexReviewStatePending {
        t.Fatalf("state = %q, want pending", got)
    }

    reviewContext := repoops.ReviewContext{
        CodexReviewRequestedAt: &requested,
        CodexReviewApprovedAt:  &approvedLater,
    }
    if got := repoops.CodexReviewStateFromContext(reviewContext); got != repoops.CodexReviewStateApproved {
        t.Fatalf("state = %q, want approved", got)
    }

    reviewContext := repoops.ReviewContext{
        CodexReviewRequestedAt: &requested,
        CodexReviewApprovedAt:  &approvedLater,
        CodexReviewThreads:     []domain.GitHubReviewThread{{ID: "thread-1"}},
    }
    if got := repoops.CodexReviewStateFromContext(reviewContext); got != repoops.CodexReviewStateUnresolvedFeedback {
        t.Fatalf("state = %q, want unresolved_feedback", got)
    }

For the fake Linear server, the issue snapshot should stop being hard-coded. The label nodes returned from `issueNode` must come from mutable server state so tests can assert real convergence instead of only mutation calls.

## Interfaces and Dependencies

At the end of this work, these interfaces and helpers must exist in concrete form.

The tracker interface in `internal/tracker/tracker.go` must include:

    type Client interface {
        FetchCandidateIssues(ctx context.Context) ([]domain.Issue, error)
        FetchIssuesByStates(ctx context.Context, stateNames []string) ([]domain.Issue, error)
        FetchIssueStatesByIDs(ctx context.Context, issueIDs []string) ([]domain.Issue, error)
        FetchIssueByID(ctx context.Context, issueID string) (domain.Issue, error)
        UpdateIssueState(ctx context.Context, issueID string, stateName string) error
        EnsureIssueLabel(ctx context.Context, labelName string) error
        AddIssueLabel(ctx context.Context, issueID string, labelName string) error
        RemoveIssueLabel(ctx context.Context, issueID string, labelName string) error
        ...
    }

The shared domain layer in `internal/domain/types.go` must define stable label names and a helper returning the managed review label set.

The GitHub review helper must expose a pure classification API that only depends on existing `repoops.ReviewContext` fields. A shape like this is preferred:

    type CodexReviewState string

    const (
        CodexReviewStateNone               CodexReviewState = ""
        CodexReviewStatePending            CodexReviewState = "pending"
        CodexReviewStateApproved           CodexReviewState = "approved"
        CodexReviewStateUnresolvedFeedback CodexReviewState = "unresolved_feedback"
    )

    func CodexReviewStateFromContext(reviewContext ReviewContext) CodexReviewState
    func LinearLabelForCodexReviewState(state CodexReviewState) string

The orchestrator must have one helper responsible for convergence of managed review labels. A shape like this is preferred:

    func (o *Orchestrator) syncCodexReviewLabels(ctx context.Context, issues []domain.Issue)

That helper should not fail the entire poll tick when one issue cannot be synced. It should log and continue.

The runner merge path must have one helper that removes all managed review labels best-effort on merge success. Keep it localized so terminal cleanup does not leak into unrelated runner paths.

Revision note: Initial ExecPlan created on 2026-03-30 to define the implementation path before making any code changes. The plan records the chosen label taxonomy, precedence rules, and the need for periodic sync plus terminal cleanup so a future contributor can implement the feature without rediscovering those decisions.
21:07:28 UTCother_message
One last detail matters for the plan: the fake Linear server currently knows label definitions but does not model per-issue label add/remove state beyond the static issue payload. I’m checking that so the ExecPlan can spell out the fixture changes instead of hand-waving them.
21:07:13 UTCother_message
I have enough to write the plan. The main design choice is to make the labels a single managed state machine with explicit precedence and stale-label cleanup, because startup-only creation without periodic syncing would leave the Linear board wrong as GitHub review status changes.
21:06:27 UTCother_message
I’ve pinned down the three Codex review states already present in behavior: review requested via `EYES`, review approved via `THUMBS_UP`, and unresolved Codex review threads. That is enough to define concrete Linear labels instead of inventing a new workflow.
21:06:17 UTCother_message
The remaining gap is the actual Codex review lifecycle inside the runner, because the new labels need to mirror that lifecycle rather than generic PR state. I’m reading those branches now so the plan can define an exact label state machine.
21:06:05 UTCother_message
I have the likely implementation shape now. I’m checking the Linear client’s label-creation semantics so the plan can state whether startup should ensure labels globally or per-team and whether the existing helper is enough.
21:05:55 UTCother_message
I’ve confirmed the repository already recognizes Codex review signals from GitHub reactions, which makes label mirroring feasible, but there is still no general label-removal path. I’m checking the tracker test scaffolding next because the plan needs to call out every test fixture that must evolve.
21:05:46 UTCother_message
I’m gathering the exact extension points and existing tests so the plan can tell a novice precisely where to edit, what behavior to add, and how to prove it works.
21:05:35 UTCother_message
Create an ExecPlan for the Linear issue below.

Do not modify repository files or implement the change yet.
Return only the final ExecPlan markdown document as file contents, without surrounding commentary and without wrapping it in an outer triple-backtick fence.

Issue context:
- Identifier: COLIN-128
- Title: codex PR review labels
- State: In Progress
- URL: https://linear.app/bothnia/issue/COLIN-128/codex-pr-review-labels

Issue description:

we should mirror the github PR review status by codex using linear issue labels, so it is possible to see what state an issue is in when I look at the linear board. right now I can't see if an issue is undergoing review, if the review has finished, or if there are unresolved feedback - surface this as labels, and verify if the labels exist when starting (and create the missing ones)

ExecPlan authoring guide:

OVERRIDE Instructions for using these plans from within Codex:

If you are instructed to use this plan to create an ephemeral plan, create a temporary file with your ExecPlan outside of the repository and track your progress there, and incorporate the final plan into your summary.

---

# ChatGPT Codex Execution Plans (ExecPlans):

This document describes the requirements for an execution plan ("ExecPlan"), a design document that a stateless coding agent can follow to deliver a working feature or system change. Treat the reader as a complete beginner to this repository: they have only the current working tree and the single ExecPlan file you provide. There is no memory of prior plans and no external context.

NON-NEGOTIABLE REQUIREMENTS:

* EVERY EXECPLAN MUST BE FULLY SELF-CONTAINED. Self-contained means that in its current form it contains all knowledge and instructions needed for a novice to succeed.
* EVERY EXECPLAN IS A LIVING DOCUMENT. Contributors are required to revise it as progress is made, as discoveries occur, and as design decisions are finalized. Each revision must remain fully self-contained.
* EVERY EXECPLAN MUST ENABLE A COMPLETE NOVICE TO IMPLEMENT THE FEATURE END-TO-END WITHOUT PRIOR KNOWLEDGE OF THIS REPO.
* EVERY EXECPLAN MUST PRODUCE A DEMONSTRABLY WORKING BEHAVIOR, NOT JUST CODE CHANGES THAT "MEET A DEFINITION".
* DEFINE EVERY TERM OF ART IN PLAIN LANGUAGE OR DO NOT USE IT.

Purpose and intent come first. Begin by explaining, in a few sentences, why the work matters from a user's perspective: what someone can do after this change that they could not do before, and how to see it working. Then guide the reader through the exact steps to achieve that outcome, including what to edit, what to run, and what they should observe.

The agent executing your plan can list files, read files, search, run the project, and run tests. It does not know any prior context and cannot infer what you meant from earlier milestones. Repeat any assumption you rely on. Do not point to external blogs or docs; if knowledge is required, embed it in the plan itself in your own words.

Format and envelope are simple and strict. Each ExecPlan must be one single fenced code block labeled as `md` that begins and ends with triple backticks. Do not nest additional triple-backtick code fences inside; when you need to show commands, transcripts, diffs, or code, present them as indented blocks within that single fence. Use indentation for clarity rather than code fences inside an ExecPlan to avoid prematurely closing the ExecPlan's code fence. Use two newlines after every heading, use # and ## and so on, and correct syntax for ordered and unordered lists.

When writing an ExecPlan to a Markdown (.md) file where the content of the file *is only* the single ExecPlan, you should omit the triple backticks.

Write in plain prose. Prefer sentences over lists. Avoid checklists, tables, and long enumerations unless brevity would obscure meaning. Checklists are permitted only in the `Progress` section, where they are mandatory. Narrative sections must remain prose-first.

Self-containment and plain language are paramount. If you introduce a phrase that is not ordinary English ("daemon", "middleware", "RPC gateway", "filter graph"), define it immediately and remind the reader how it manifests in this repository (for example, by naming the files or commands where it appears). Do not say "as defined previously" or "according to the architecture doc." Include the needed explanation here, even if you repeat yourself.

Anchor everything to observable outcomes. State what the user can do after implementation, the commands to run, and the outputs they should see. Acceptance should be phrased as behavior a human can verify ("after starting the server, navigating to [http://localhost:8080/health](http://localhost:8080/health) returns HTTP 200 with body OK") rather than internal attributes ("added a HealthCheck struct"). If a change is internal, explain how its impact can still be demonstrated (for example, by running tests that fail before and pass after, and by showing a scenario that uses the new behavior).

Specify repository context explicitly. Name files with full repository-relative paths, name functions and modules precisely, and describe where new files should be created. If touching multiple areas, include a short orientation paragraph that explains how those parts fit together so a novice can navigate confidently. When running commands, show the working directory and exact command line. When outcomes depend on environment, state the assumptions and provide alternatives when reasonable.

Be idempotent and safe. Write the steps so they can be run multiple times without causing damage or drift. If a step can fail halfway, include how to retry or revert. If a migration or destructive operation is necessary, spell out backups or safe fallbacks. Prefer additive, testable changes that can be validated as you go.

Validation is not optional. Include instructions to run tests, to start the system if applicable, and to observe it doing something useful. Include expected outputs and error messages so a novice can tell success from failure. Where possible, show how to prove that the change is effective beyond compilation (for example, through a small end-to-end scenario, a CLI invocation, or an HTTP request/response transcript). State the exact test commands appropriate to the project’s toolchain and how to interpret their results.

Capture evidence. When your steps produce terminal output, short diffs, or logs, include them inside the single fenced block as indented transcript examples so the agent knows what success looks like. If the output is long, include the most relevant lines and say what to look for. If you need to include a patch, prefer file-scoped diffs or small excerpts that a reader can recreate by following your instructions rather than pasting large blobs.

Milestones are narrative, not bureaucracy. If you break the work into milestones, introduce each with a brief paragraph that describes the scope, what will exist at the end of the milestone that did not exist before, the commands to run, and the acceptance you expect to observe. Keep it readable as a story: goal, work, result, proof. Progress and milestones are distinct: milestones tell the story, progress tracks granular work. Both must exist. Never abbreviate a milestone merely for the sake of brevity, do not leave out details that could be crucial to a future implementation.

Each milestone must be independently verifiable and incrementally implement the overall goal of the execution plan.

Living plans and design decisions.

* ExecPlans are living documents. As you make key design decisions, update the plan to record both the decision and the thinking behind it. Record all decisions in the `Decision Log` section.
* ExecPlans must contain and maintain a `Progress` section, a `Surprises & Discoveries` section, a `Decision Log`, and an `Outcomes & Retrospective` section. These are not optional.
* When you discover optimizer behavior, performance tradeoffs, unexpected bugs, or inverse/unapply semantics that shaped your approach, capture those observations in the `Surprises & Discoveries` section with short evidence snippets (test output is ideal).
* If you change course mid-implementation, document why in the `Decision Log` and reflect the implications in `Progress`. Plans are guides for the next contributor as much as checklists for you.
* At completion of a major task or the full plan, write an `Outcomes & Retrospective` entry summarizing what was achieved, what remains, and lessons learned.

Prototyping milestones and parallel implementations.

* It is acceptable—and often encouraged—to include explicit prototyping milestones when they de-risk a larger change. Examples: adding a low-level operator to a dependency to validate feasibility, or exploring two composition orders while measuring optimizer effects.
* Keep prototypes additive and testable. Clearly label the scope as “prototyping”; describe how to run and observe results; and state the criteria for promoting or discarding the prototype.
* Parallel implementations (e.g., keeping an adapter alongside an older path during migration) are fine when they reduce risk. Describe how to validate both paths and how to retire one safely with tests.

Avoid common failure modes. Do not rely on undefined jargon. Do not describe "the letter of a feature" so narrowly that the resulting code compiles but does nothing meaningful. Do not outsource key decisions to the reader. When ambiguity exists, resolve it in the plan itself and explain why you chose that path. Err on the side of over-explaining user-visible effects and under-specifying incidental implementation details.

---

## Skeleton of a Good ExecPlan

```
# <Short, action-oriented description>

This ExecPlan is a living document. The sections `Progress`, `Surprises & Discoveries`, `Decision Log`, and `Outcomes & Retrospective` must be kept up to date as work proceeds.

## Purpose / Big Picture

Explain in a few sentences what someone gains after this change and how they can see it working. State the user-visible behavior you will enable.

## Progress

Use a list with checkboxes to summarize granular steps. Every stopping point must be documented here, even if it requires splitting a partially completed task into two (“done” vs. “remaining”). This section must always reflect the actual current state of the work.

- [x] Example completed step.  
- [ ] Example incomplete step.  
- [ ] Example partially completed step (completed: X; remaining: Y).  

## Surprises & Discoveries

Document unexpected behaviors, bugs, optimizations, or insights discovered during implementation. Provide concise evidence.  

- Observation: …  
  Evidence: …  

## Decision Log

Record every decision made while working on the plan in the format:  

- Decision: …  
  Rationale: …  
  Date/Author: …  

## Outcomes & Retrospective

Summarize outcomes, gaps, and lessons learned at major milestones or at completion. Compare the result against the original purpose.  

## Context and Orientation

Describe the current state relevant to this task as if the reader knows nothing. Name the key files and modules by full path. Define any non-obvious term you will use. Do not refer to prior plans.

## Plan of Work

Describe, in prose, the sequence of edits and additions. For each edit, name the file and location (function, module) and what to insert or change. Keep it concrete and minimal.

## Concrete Steps

State the exact commands to run and where to run them (working directory). When a command generates output, show a short expected transcript so the reader can compare. This section must be updated as work proceeds.

## Validation and Acceptance

Describe how to start or exercise the system and what to observe. Phrase acceptance as behavior, with specific inputs and outputs. If tests are involved, say "run <project’s test command> and expect <N> passed; the new test <name> fails before the change and passes after>".

## Idempotence and Recovery

If steps can be repeated safely, say so. If a step is risky, provide a safe retry or rollback path. Keep the environment clean after completion.

## Artifacts and Notes

Include the most important transcripts, diffs, or snippets as indented examples. Keep them concise and focused on what proves success.

## Interfaces and Dependencies

Be prescriptive. Name the libraries, modules, and services to use and why. Specify the types, traits/interfaces, and function signatures that must exist at the end of the milestone. Prefer stable names and paths such as `server.Execute()` or `server.Fetcher`.

    type Planner interface {
        func Plan(context.Context, Config) Plan;
    }
```

If you follow the guidance above, a single, stateless agent -- or a human novice -- can read your ExecPlan from top to bottom and produce a working, observable result. That is the bar: SELF-CONTAINED, SELF-SUFFICIENT, NOVICE-GUIDING, OUTCOME-FOCUSED.

When you revise a plan, you must ensure your changes are comprehensively reflected across all sections, including the living document sections, and you must write a note at the bottom of the plan describing the change and the reason why. ExecPlans must describe not just the what but the why for almost everything.
21:05:35 UTCsession_started
21:05:35 UTCturn_completed
21:05:35 UTCother_message
COLIN_EXECPLAN_DECISION: EXEC_PLAN

This change is cross-cutting: it needs new Linear label management, mapping GitHub PR review signals into a stable label state machine, and updates to orchestration and tests. It is easy to get wrong with stale or conflicting labels, so a persistent plan is warranted before implementation.
21:05:32 UTCother_message
Your previous ExecPlan strategy response could not be parsed.

Return a short answer.
The first line must be exactly one of:
COLIN_EXECPLAN_DECISION: ONE_SHOT
COLIN_EXECPLAN_DECISION: EXEC_PLAN

After the first line, include a brief rationale in 1-3 sentences.
Do not repeat the original question or issue description.
Your previous first line was: "I’m checking the current workspace and existing Linear/GitHub review handling so the decision reflects the actual implementation scope rather than just the issue text."
21:05:32 UTCsession_started
21:05:32 UTCturn_completed
21:05:31 UTCother_message
COLIN_EXECPLAN_DECISION: EXEC_PLAN

This spans multiple subsystems rather than a single localized change: Linear label lifecycle, orchestrator review-state syncing, GitHub review signal interpretation, and test coverage all need coordinated updates. It also needs an explicit label taxonomy and transition rules so labels do not get stuck or conflict as issues move through `Review` and back to `Todo`.
21:05:11 UTCother_message
I’ve confirmed the service already ensures one label at startup and has issue-label mutations. The open question is whether review state mapping is a localized extension or spreads across orchestrator state, tracker API, docs, and tests enough to justify a stored plan.
21:05:03 UTCother_message
There is already PR review sync and label-management infrastructure, so I’m narrowing this to the exact extension points: Linear label lifecycle and GitHub review-state mirroring.
21:04:56 UTCother_message
I’m checking the current workspace and existing Linear/GitHub review handling so the decision reflects the actual implementation scope rather than just the issue text.
21:04:48 UTCother_message
Decide whether the Linear issue below should be handled as a one-shot change or should first get an ExecPlan.

Return a short answer.
The first line must be exactly one of:
COLIN_EXECPLAN_DECISION: ONE_SHOT
COLIN_EXECPLAN_DECISION: EXEC_PLAN

After the first line, include a brief rationale in 1-3 sentences.
Choose `ONE_SHOT` only when the change is small and safe enough to implement directly without a stored plan.
Choose `EXEC_PLAN` when the issue is large, risky, multi-step, or would benefit from a persistent implementation plan.

Issue context:
- Identifier: COLIN-128
- Title: codex PR review labels
- State: In Progress
- URL: https://linear.app/bothnia/issue/COLIN-128/codex-pr-review-labels

Issue description:

we should mirror the github PR review status by codex using linear issue labels, so it is possible to see what state an issue is in when I look at the linear board. right now I can't see if an issue is undergoing review, if the review has finished, or if there are unresolved feedback - surface this as labels, and verify if the labels exist when starting (and create the missing ones)
21:04:43 UTCsession_started

Retry queue

No retries are waiting. Colin is either idle or actively running work.

Rate limits

Latest limits reported by Codex and Linear.

Codex

3% used of 5h window which resets in 4h13m
21% used of 1w window which resets in 78h54m

Linear

resets in 1h, 4999 of 5000 remaining

API snapshot

Use the JSON endpoint for scripts or debugging outside the browser.

{"generated_at":"2026-03-30T21:15:55Z","running":1,"retrying":0}