21:15:20 UTCother_message

The service end-to-end test regressed. I’m running that test alone with verbose output so I can see whether the new startup label ensures or the poll-loop sync changed timing or introduced a hidden startup failure.

21:15:07 UTCother_message

The focused suite is still running. I’m polling for the remaining package results now; if anything else fails, I’ll fix it before widening to `go test ./...`.

21:14:51 UTCother_message

The first test run surfaced one bad assumption: the “approval pending” helper lives in the Codex runner, not in `repoops`. I’m inlining the comparison logic into the new repo review-state helper so the dependency direction stays clean.

21:14:41 UTCother_message

`gofmt` only failed because I included `README.md`. I’m rerunning it on the Go files only, then I’ll execute the focused test set.

21:14:36 UTCother_message

I’m at the verification phase. I’ve got the core logic and tests in place, and I’m formatting before running the targeted Go suites so any failures point to behavior rather than syntax noise.

21:14:23 UTCother_message

I’ve covered the new logic with unit tests. I’m doing the docs update now, then I’ll format and run the focused Go test set to catch any interface drift or missed imports.

21:12:48 UTCother_message

I’m adding the unit tests around the new removal path first. That gives me a tight feedback loop before I add broader sync behavior on top of it.

21:12:25 UTCother_message

The next pass is tests and the fake Linear server. I need those in place before I can safely refactor sync behavior, because the current fake server can’t actually show label convergence yet.

21:11:37 UTCother_message

I’m adding the merge cleanup in the narrowest place: `buildMergedResult`, after the issue reaches its post-merge state but before metadata is persisted. That keeps the cleanup local to successful merge completion.

21:11:12 UTCother_message

I’m wiring the poll-loop sync now. The intent is best-effort convergence: use the existing tracked-issue fetch, classify Codex review state only when there is a PR signal, and aggressively clear stale managed labels when there isn’t.

21:10:33 UTCother_message

I’m editing the Linear client now. The main change here is a no-op-safe removal path so later sync code can converge labels instead of only appending more state.

21:10:18 UTCother_message

I’ve got the integration points. I’m editing the domain, tracker, and repo review-state helpers first, then I’ll wire orchestration and merge cleanup around them and finish with the fake server/test updates.

21:10:16 UTCother_message

I’ve narrowed the edit set. Next I’m checking the merge-success helper and existing runner/orchestrator tests so I can add the smallest cleanup hook and keep the stubs in sync with the tracker interface change.

21:10:08 UTCother_message

I’ve confirmed the tree is clean. I’m reading the exact call sites now so I can thread removal and sync through the existing startup, poll loop, and merge-success paths without widening the change.

21:10:00 UTCother_message

I’m rechecking the worktree and the relevant files before editing so I don’t clobber any changes made since the last turn, then I’ll implement the label lifecycle end to end.

21:09:51 UTCother_message

You are working on Linear issue COLIN-128: codex PR review labels.

Repository rules:
- Follow `AGENTS.md`.
- Make changes only in this repository workspace.
- Prefer the smallest correct change that resolves the issue.
- Run relevant Go tests before you finish.

Issue context:
- State: In Progress
- URL: https://linear.app/bothnia/issue/COLIN-128/codex-pr-review-labels
Issue description:

we should mirror the github PR review status by codex using linear issue labels, so it is possible to see what state an issue is in when I look at the linear board. right now I can't see if an issue is undergoing review, if the review has finished, or if there are unresolved feedback - surface this as labels, and verify if the labels exist when starting (and create the missing ones)

Definition of done:
- Implement the requested change.
- Add or update tests when behavior changes.
- Leave the repo in a clean, reviewable state ready for `Review`.
- Do not merge changes yourself during coding turns; Colin will publish in `Review` and merge in `Merge`.
- Summarize what changed, what was tested, and any remaining risk.

Output contract:
- If the issue is still too underspecified to implement safely, begin your final response with `COLIN_OUTCOME: NEEDS_SPEC`.
- After `COLIN_OUTCOME: NEEDS_SPEC`, explain what information is missing and include the exact sentence `The spec should be improved before implementation.`
- If the issue is implementable, begin your final response with `COLIN_OUTCOME: READY_FOR_REVIEW`.
- `Review` is PR-only. Clarification-only handoffs go to `Refine`.

ExecPlan:

I’m gathering the exact extension points and existing tests so the plan can tell a novice precisely where to edit, what behavior to add, and how to prove it works.

I’ve confirmed the repository already recognizes Codex review signals from GitHub reactions, which makes label mirroring feasible, but there is still no general label-removal path. I’m checking the tracker test scaffolding next because the plan needs to call out every test fixture that must evolve.

I have the likely implementation shape now. I’m checking the Linear client’s label-creation semantics so the plan can state whether startup should ensure labels globally or per-team and whether the existing helper is enough.

The remaining gap is the actual Codex review lifecycle inside the runner, because the new labels need to mirror that lifecycle rather than generic PR state. I’m reading those branches now so the plan can define an exact label state machine.

I’ve pinned down the three Codex review states already present in behavior: review requested via `EYES`, review approved via `THUMBS_UP`, and unresolved Codex review threads. That is enough to define concrete Linear labels instead of inventing a new workflow.

I have enough to write the plan. The main design choice is to make the labels a single managed state machine with explicit precedence and stale-label cleanup, because startup-only creation without periodic syncing would leave the Linear board wrong as GitHub review status changes.

One last detail matters for the plan: the fake Linear server currently knows label definitions but does not model per-issue label add/remove state beyond the static issue payload. I’m checking that so the ExecPlan can spell out the fixture changes instead of hand-waving them.

# Mirror Codex PR Review Status Into Linear Labels

This ExecPlan is a living document. The sections `Progress`, `Surprises & Discoveries`, `Decision Log`, and `Outcomes & Retrospective` must be kept up to date as work proceeds.

## Purpose / Big Picture

After this change, a person looking at the Linear board will be able to tell, from issue labels alone, whether Codex review for the linked GitHub pull request is still in progress, has finished successfully, or still has unresolved Codex feedback. Colin will also create any missing review-status labels at startup so the sync does not silently fail on a new workspace or a newly configured Linear team.

The visible result is simple. For tracked issues with a linked pull request, Colin will keep exactly one of these labels in sync with GitHub Codex review status: `codex-review: pending`, `codex-review: approved`, or `codex-review: unresolved-feedback`. If there is no current Codex review status to show, Colin will remove those managed labels so stale board state does not linger.

## Progress

- [ ] Add managed Codex review label constants and helper functions in the shared domain layer.
- [ ] Extend the tracker interface and Linear client with safe label removal support.
- [ ] Generalize service startup label initialization so Colin ensures `paused` plus all managed Codex review labels exist.
- [ ] Add a single, explicit Codex review state machine based on existing GitHub review signals.
- [ ] Add periodic best-effort label syncing for tracked issues that already have a GitHub pull request.
- [ ] Clear managed Codex review labels when a pull request is merged and the issue leaves review-related states.
- [ ] Update fake tracker servers and test doubles so add/remove label mutations change per-issue label state.
- [ ] Add focused unit tests for label creation, label removal, review-state mapping, and label sync precedence.
- [ ] Add or extend service-level fake end-to-end coverage to prove label sync behavior is observable.
- [ ] Update `README.md` so operators know which labels Colin manages and what each label means.

## Surprises & Discoveries

* Observation: Colin already has the GitHub-side data needed for this feature. `internal/repoops/manager.go` already computes three Codex review signals inside `ReviewContext`: unresolved Codex review threads, the latest `EYES` reaction from `chatgpt-codex-connector[bot]`, and the latest `THUMBS_UP` reaction from the same bot.
Evidence: `ReviewContext` already exposes `CodexReviewThreads`, `CodexReviewRequestedAt`, and `CodexReviewApprovedAt`, and `internal/agent/codex/runner.go` already uses those fields to block merge when Codex review is still pending.
* Observation: Colin can create and add Linear labels today, but it cannot remove them. Without removal support, any review label introduced by this feature would become stale as soon as GitHub state changes.
Evidence: `internal/tracker/tracker.go` only exposes `EnsureIssueLabel` and `AddIssueLabel`, and `internal/tracker/linear/client.go` implements only `issueAddLabel`-style mutations.
* Observation: startup currently ensures only the `paused` label exists. The new labels must be added to the same startup path or the first sync pass will depend on manual Linear setup.
Evidence: `internal/service/service.go` has a dedicated `ensurePausedLabel` helper called from `Service.Run`.
* Observation: the fake Linear server used by service tests does not currently persist per-issue label changes; it returns a static `e2e` label on every issue snapshot.
Evidence: `internal/service/e2e_test.go` stores label definitions in `fakeLinearServer.labels`, but `issueNode` always returns `labels.nodes = [{"name":"e2e"}]`.

## Decision Log

* Decision: Manage exactly three mutually exclusive Linear labels: `codex-review: pending`, `codex-review: approved`, and `codex-review: unresolved-feedback`.
Rationale: The issue asks for three user-visible states. A single managed state dimension is easier to understand on the Linear board than a combinatorial set of overlapping labels.
Date/Author: 2026-03-30 / Codex
* Decision: Give `unresolved-feedback` the highest precedence, then `pending`, then `approved`, and otherwise show no managed Codex review label.
Rationale: unresolved feedback is the most actionable state; if Codex review threads are still open, the board should say that even if an earlier reaction history also contains `EYES` or `THUMBS_UP`.
Date/Author: 2026-03-30 / Codex
* Decision: Sync review labels best-effort during the orchestrator poll loop for tracked non-terminal states, instead of only when a runner happens to touch the issue.
Rationale: the user wants board visibility. That requires the labels to update while an issue sits idle in `Review` or `Merge`, not only when another run starts.
Date/Author: 2026-03-30 / Codex
* Decision: Clear all managed Codex review labels explicitly on successful merge, even if the periodic sync does not inspect terminal states.
Rationale: otherwise a merged issue could retain a stale review label forever after leaving Colin’s tracked handoff states.
Date/Author: 2026-03-30 / Codex

## Outcomes & Retrospective

Implementation has not started yet. A successful outcome will mean that a tracked issue with a linked pull request can move through these observable board states without manual label editing: pending review, approved review, unresolved feedback, and finally no managed review label after merge or when no GitHub Codex review signal exists.

When this plan is executed, update this section with the exact tests added, any deviations from the initial label taxonomy, and whether the final implementation kept sync entirely inside the orchestrator or needed a second sync hook elsewhere.

## Context and Orientation

Colin is a long-running Go service. It polls Linear, prepares per-issue workspaces, asks GitHub for pull request state, and runs Codex against issues in active Linear states. The code relevant to this feature is split across a few packages.

`internal/service/service.go` is the startup layer. It loads runtime dependencies and currently ensures the `paused` label exists before the orchestrator starts. This is the right place to ensure the new managed review labels exist too.

`internal/tracker/tracker.go` defines the tracker interface used by both the orchestrator and the runner. `internal/tracker/linear/client.go` is the Linear GraphQL implementation. This code already knows how to find a label by name, create a missing label, and add a label to an issue. It needs symmetrical removal support so Colin can remove stale review labels.

`internal/repoops/manager.go` is the GitHub adapter. Its `ReviewContext` type already exposes everything needed to classify Codex review state: unresolved Codex review threads, the latest Codex `EYES` reaction, and the latest Codex `THUMBS_UP` reaction.

`internal/orchestrator/orchestrator.go` owns the poll loop. It is the right place to perform best-effort periodic syncing from GitHub review state into Linear labels, because it already visits tracked issues continuously.

`internal/agent/codex/runner.go` handles publish and merge flows. Merge success is the one place where an issue can leave review-related states permanently, so this file is the right place for explicit terminal cleanup of managed review labels.

The main tests that must move with this feature are in `internal/tracker/linear/client_test.go`, `internal/repoops/manager_test.go`, `internal/orchestrator/orchestrator_test.go`, and `internal/service/e2e_test.go`. The last file is especially important because it uses fake Linear and fake GitHub behavior to prove Colin’s end-to-end orchestration.

## Plan of Work

### Milestone 1: Add label primitives and symmetric Linear label operations

Start by making label management a first-class concept. In `internal/domain/types.go`, add constants for the three managed Codex review labels and a helper that returns the full slice of managed review labels in a stable order. Keep the existing `PausedIssueLabel` constant; this change is additive.

Then extend `internal/tracker/tracker.go` with a `RemoveIssueLabel(ctx context.Context, issueID string, labelName string) error` method. Implement it in `internal/tracker/linear/client.go`. Do not make removal create missing labels. Removal should first look up the existing label ID by name, treat a missing label definition as a no-op, and only call the Linear mutation when the label exists. This avoids the nonsense case where a removal request creates a label just so it can delete it from one issue.

While editing `internal/service/service.go`, replace `ensurePausedLabel` with a more general helper that ensures `paused` plus all managed Codex review labels exist during service startup. Keep the startup behavior idempotent: rerunning Colin should only reuse existing labels, not duplicate them.

The independently verifiable result of this milestone is that the tracker can add and remove named labels safely, and service startup ensures the label definitions exist before sync logic runs. Prove this with focused tests in `internal/tracker/linear/client_test.go` and a service-level startup test or e2e assertion that the fake Linear server sees creation requests for missing managed labels.

### Milestone 2: Define one Codex review state machine and sync it during polling

Next, define one explicit state machine for Codex review status. Put it in a location both orchestration code and runner code can use without circular imports. A small helper near `internal/repoops/manager.go` is a good fit because `ReviewContext` already lives there. Add a type such as `CodexReviewState` with values for `none`, `pending`, `approved`, and `unresolved_feedback`, plus a helper that maps that state to the exact Linear label name.

The classification rules must be:

If `len(reviewContext.CodexReviewThreads) > 0`, the state is `unresolved_feedback`.

Otherwise, if there is a latest Codex `EYES` reaction and there is no later Codex `THUMBS_UP` reaction, the state is `pending`.

Otherwise, if there is a latest Codex `EYES` reaction and there is a later Codex `THUMBS_UP` reaction, the state is `approved`.

Otherwise, the state is `none`.

After that, add a new best-effort label sync pass to the orchestrator poll loop. The goal is not to block dispatch when GitHub or Linear blips. The goal is to keep the board reasonably fresh. The sync should inspect tracked non-terminal issues, skip issues with no pull request signal, call `Repo.ReviewContext` only for issues that already have an associated pull request, compute the desired managed label, and then make Linear match that desired state by adding the desired label and removing the other managed review labels. If the desired state is `none`, remove all managed review labels.

The simplest place to invoke this is from `internal/orchestrator/orchestrator.go` inside `handleTick`, near the existing state refresh path. Keep the sync isolated in a helper such as `syncCodexReviewLabels(ctx, issues []domain.Issue)` so it can be tested directly. If needed, refactor `refreshIssueStateCounts` to return the fetched issue slice so the same data can drive both board statistics and review-label syncing without a second Linear query.

The independently verifiable result of this milestone is that, given a domain issue and a fake `ReviewContext`, Colin computes one exact managed label and removes stale alternatives. Prove this with new orchestrator tests that cover all precedence cases: unresolved feedback wins over reactions, a later approval beats an older review request, and no signal clears all managed review labels.

### Milestone 3: Clean up labels on merge, update fakes, and document the behavior

Finally, make the feature durable and observable in the existing test harness. In `internal/agent/codex/runner.go`, add a best-effort cleanup call on successful merge that removes all managed Codex review labels before or immediately after the issue transitions out of merge-related states. Do not fail a successful merge if label cleanup fails; log a warning and continue.

Update `internal/service/e2e_test.go` so `fakeLinearServer` stores per-issue labels instead of returning a hard-coded `e2e` label forever. `issueAddLabel` must update the issue’s current label set, and the new `issueRemoveLabel` mutation must remove from that same set. This fake server is the only way to prove the full orchestration path without a real Linear workspace.

Then add at least one service-level scenario that shows the board-facing behavior. It does not need to hit real GitHub. Use the existing fake GitHub reaction and review thread files to simulate each state transition and assert that the fake Linear issue snapshot contains the expected managed label. Also update `README.md` in the “How Colin Works” section so operators know that Colin now ensures and manages Codex review status labels in Linear, what those labels mean, and that they are automatically removed when no longer applicable.

The independently verifiable result of this milestone is an end-to-end fake test that shows a Linear issue gaining and losing the correct managed review label based on GitHub Codex review signals, plus documentation that matches the implementation.

## Concrete Steps

Work from the repository root:

```
cd /Users/pme/src/pmenglund/colin/.colin/workspaces/COLIN-128
```

Edit the shared constants first.

```
apply_patch on internal/domain/types.go
```

Add tracker interface support next.

```
apply_patch on internal/tracker/tracker.go
apply_patch on internal/tracker/linear/client.go
apply_patch on internal/tracker/linear/client_test.go
```

Generalize startup label initialization.

```
apply_patch on internal/service/service.go
apply_patch on internal/service/service_test.go or internal/service/e2e_test.go
```

Add the Codex review state helper and orchestrator sync.

```
apply_patch on internal/repoops/manager.go or a new sibling file under internal/repoops/
apply_patch on internal/orchestrator/orchestrator.go
apply_patch on internal/orchestrator/orchestrator_test.go
```

Add terminal cleanup for merge success.

```
apply_patch on internal/agent/codex/runner.go
apply_patch on internal/agent/codex/runner_test.go if a focused unit test is cleaner than covering this only through service tests
```

Teach the fake Linear server how labels actually mutate.

```
apply_patch on internal/service/e2e_test.go
```

Update docs last, once behavior and exact label names are final.

```
apply_patch on README.md
```

Format and test after each milestone.

```
gofmt -w internal/domain/types.go internal/tracker/tracker.go internal/tracker/linear/client.go internal/tracker/linear/client_test.go internal/service/service.go internal/orchestrator/orchestrator.go internal/orchestrator/orchestrator_test.go internal/agent/codex/runner.go internal/service/e2e_test.go README.md
```

Run the focused tests first so failures are easier to interpret.

```
go test ./internal/tracker/linear ./internal/repoops ./internal/orchestrator ./internal/agent/codex ./internal/service
```

Then run the whole Go suite.

```
go test ./...
```

Expected success transcript after the implementation is complete will look like:

```
ok github.com/pmenglund/colin/internal/tracker/linear
ok github.com/pmenglund/colin/internal/repoops
ok github.com/pmenglund/colin/internal/orchestrator
ok github.com/pmenglund/colin/internal/agent/codex
ok github.com/pmenglund/colin/internal/service
ok github.com/pmenglund/colin/...
```

If one of the new label-sync tests fails, the most likely causes are wrong precedence, forgetting to remove stale labels, or leaving the fake Linear server’s issue snapshot static.

## Validation and Acceptance

Acceptance is behavioral, not structural.

First, startup must ensure all managed labels exist. A fresh fake Linear server with no labels should observe Colin looking up and then creating `paused`, `codex-review: pending`, `codex-review: approved`, and `codex-review: unresolved-feedback` exactly once each. Re-running the same startup path should reuse cached or existing labels and should not create duplicates.

Second, the orchestrator sync must keep exactly one managed Codex review label on an issue with a linked pull request. The expected mapping is:

* unresolved Codex review threads present: the issue has `codex-review: unresolved-feedback` and does not have the other two managed review labels
* no unresolved threads, latest `EYES` reaction newer than latest `THUMBS_UP` or approval absent: the issue has `codex-review: pending`
* no unresolved threads, latest `THUMBS_UP` newer than latest `EYES`: the issue has `codex-review: approved`
* no Codex review signal at all, or no linked pull request: the issue has none of the managed review labels

Third, successful merge must remove all managed Codex review labels before the issue leaves merge-related states. This must be true even if the issue previously had `codex-review: approved` or `codex-review: unresolved-feedback`.

The strongest proof should be a fake service-level scenario in `internal/service/e2e_test.go` that mutates fake GitHub reactions and fake GitHub review threads across polls and then asserts on the fake Linear issue’s label set. A new contributor should be able to read that test alone and understand the intended board behavior.

## Idempotence and Recovery

All label operations in this plan are safe to run repeatedly. Ensuring label definitions is idempotent because the Linear client first looks up a label by name and only creates it when missing. Syncing issue labels is also idempotent because the desired end state is “exactly one of these labels” or “none of these labels”, and each poll should converge toward that state.

If implementation goes halfway and tests fail, revert only the managed review label additions, not unrelated changes in the worktree. The safest rollback is to remove the new managed label constants and the new orchestrator sync call, leaving the existing `paused` label behavior intact. Do not use destructive git commands against unrelated user changes.

If you need to retry service-level tests, make sure the fake Linear server resets its in-memory label state between tests. Each test should use its own fresh fake server instance rather than depending on cross-test cleanup.

## Artifacts and Notes

Use short, focused assertions in tests. Representative examples that the finished implementation should make true are:

```
reviewContext := repoops.ReviewContext{
CodexReviewRequestedAt: &requested,
}
if got := repoops.CodexReviewStateFromContext(reviewContext); got != repoops.CodexReviewStatePending {
t.Fatalf("state = %q, want pending", got)
}

reviewContext := repoops.ReviewContext{
CodexReviewRequestedAt: &requested,
CodexReviewApprovedAt: &approvedLater,
}
if got := repoops.CodexReviewStateFromContext(reviewContext); got != repoops.CodexReviewStateApproved {
t.Fatalf("state = %q, want approved", got)
}

reviewContext := repoops.ReviewContext{
CodexReviewRequestedAt: &requested,
CodexReviewApprovedAt: &approvedLater,
CodexReviewThreads: []domain.GitHubReviewThread{{ID: "thread-1"}},
}
if got := repoops.CodexReviewStateFromContext(reviewContext); got != repoops.CodexReviewStateUnresolvedFeedback {
t.Fatalf("state = %q, want unresolved_feedback", got)
}
```

For the fake Linear server, the issue snapshot should stop being hard-coded. The label nodes returned from `issueNode` must come from mutable server state so tests can assert real convergence instead of only mutation calls.

## Interfaces and Dependencies

At the end of this work, these interfaces and helpers must exist in concrete form.

The tracker interface in `internal/tracker/tracker.go` must include:

```
type Client interface {
FetchCandidateIssues(ctx context.Context) ([]domain.Issue, error)
FetchIssuesByStates(ctx context.Context, stateNames []string) ([]domain.Issue, error)
FetchIssueStatesByIDs(ctx context.Context, issueIDs []string) ([]domain.Issue, error)
FetchIssueByID(ctx context.Context, issueID string) (domain.Issue, error)
UpdateIssueState(ctx context.Context, issueID string, stateName string) error
EnsureIssueLabel(ctx context.Context, labelName string) error
AddIssueLabel(ctx context.Context, issueID string, labelName string) error
RemoveIssueLabel(ctx context.Context, issueID string, labelName string) error
...
}
```

The shared domain layer in `internal/domain/types.go` must define stable label names and a helper returning the managed review label set.

The GitHub review helper must expose a pure classification API that only depends on existing `repoops.ReviewContext` fields. A shape like this is preferred:

```
type CodexReviewState string

const (
CodexReviewStateNone CodexReviewState = ""
CodexReviewStatePending CodexReviewState = "pending"
CodexReviewStateApproved CodexReviewState = "approved"
CodexReviewStateUnresolvedFeedback CodexReviewState = "unresolved_feedback"
)

func CodexReviewStateFromContext(reviewContext ReviewContext) CodexReviewState
func LinearLabelForCodexReviewState(state CodexReviewState) string
```

The orchestrator must have one helper responsible for convergence of managed review labels. A shape like this is preferred:

```
func (o *Orchestrator) syncCodexReviewLabels(ctx context.Context, issues []domain.Issue)
```

That helper should not fail the entire poll tick when one issue cannot be synced. It should log and continue.

The runner merge path must have one helper that removes all managed review labels best-effort on merge success. Keep it localized so terminal cleanup does not leak into unrelated runner paths.

Revision note: Initial ExecPlan created on 2026-03-30 to define the implementation path before making any code changes. The plan records the chosen label taxonomy, precedence rules, and the need for periodic sync plus terminal cleanup so a future contributor can implement the feature without rediscovering those decisions.

21:09:51 UTCsession_started

21:09:51 UTCturn_completed

21:09:50 UTCother_message

# Mirror Codex PR Review Status Into Linear Labels

This ExecPlan is a living document. The sections `Progress`, `Surprises & Discoveries`, `Decision Log`, and `Outcomes & Retrospective` must be kept up to date as work proceeds.

## Purpose / Big Picture

After this change, a person looking at the Linear board will be able to tell, from issue labels alone, whether Codex review for the linked GitHub pull request is still in progress, has finished successfully, or still has unresolved Codex feedback. Colin will also create any missing review-status labels at startup so the sync does not silently fail on a new workspace or a newly configured Linear team.

The visible result is simple. For tracked issues with a linked pull request, Colin will keep exactly one of these labels in sync with GitHub Codex review status: `codex-review: pending`, `codex-review: approved`, or `codex-review: unresolved-feedback`. If there is no current Codex review status to show, Colin will remove those managed labels so stale board state does not linger.

## Progress

- [ ] Add managed Codex review label constants and helper functions in the shared domain layer.
- [ ] Extend the tracker interface and Linear client with safe label removal support.
- [ ] Generalize service startup label initialization so Colin ensures `paused` plus all managed Codex review labels exist.
- [ ] Add a single, explicit Codex review state machine based on existing GitHub review signals.
- [ ] Add periodic best-effort label syncing for tracked issues that already have a GitHub pull request.
- [ ] Clear managed Codex review labels when a pull request is merged and the issue leaves review-related states.
- [ ] Update fake tracker servers and test doubles so add/remove label mutations change per-issue label state.
- [ ] Add focused unit tests for label creation, label removal, review-state mapping, and label sync precedence.
- [ ] Add or extend service-level fake end-to-end coverage to prove label sync behavior is observable.
- [ ] Update `README.md` so operators know which labels Colin manages and what each label means.

## Surprises & Discoveries

- Observation: Colin already has the GitHub-side data needed for this feature. `internal/repoops/manager.go` already computes three Codex review signals inside `ReviewContext`: unresolved Codex review threads, the latest `EYES` reaction from `chatgpt-codex-connector[bot]`, and the latest `THUMBS_UP` reaction from the same bot.
Evidence: `ReviewContext` already exposes `CodexReviewThreads`, `CodexReviewRequestedAt`, and `CodexReviewApprovedAt`, and `internal/agent/codex/runner.go` already uses those fields to block merge when Codex review is still pending.

- Observation: Colin can create and add Linear labels today, but it cannot remove them. Without removal support, any review label introduced by this feature would become stale as soon as GitHub state changes.
Evidence: `internal/tracker/tracker.go` only exposes `EnsureIssueLabel` and `AddIssueLabel`, and `internal/tracker/linear/client.go` implements only `issueAddLabel`-style mutations.

- Observation: startup currently ensures only the `paused` label exists. The new labels must be added to the same startup path or the first sync pass will depend on manual Linear setup.
Evidence: `internal/service/service.go` has a dedicated `ensurePausedLabel` helper called from `Service.Run`.

- Observation: the fake Linear server used by service tests does not currently persist per-issue label changes; it returns a static `e2e` label on every issue snapshot.
Evidence: `internal/service/e2e_test.go` stores label definitions in `fakeLinearServer.labels`, but `issueNode` always returns `labels.nodes = [{"name":"e2e"}]`.

## Decision Log

- Decision: Manage exactly three mutually exclusive Linear labels: `codex-review: pending`, `codex-review: approved`, and `codex-review: unresolved-feedback`.
Rationale: The issue asks for three user-visible states. A single managed state dimension is easier to understand on the Linear board than a combinatorial set of overlapping labels.
Date/Author: 2026-03-30 / Codex

- Decision: Give `unresolved-feedback` the highest precedence, then `pending`, then `approved`, and otherwise show no managed Codex review label.
Rationale: unresolved feedback is the most actionable state; if Codex review threads are still open, the board should say that even if an earlier reaction history also contains `EYES` or `THUMBS_UP`.
Date/Author: 2026-03-30 / Codex

- Decision: Sync review labels best-effort during the orchestrator poll loop for tracked non-terminal states, instead of only when a runner happens to touch the issue.
Rationale: the user wants board visibility. That requires the labels to update while an issue sits idle in `Review` or `Merge`, not only when another run starts.
Date/Author: 2026-03-30 / Codex

- Decision: Clear all managed Codex review labels explicitly on successful merge, even if the periodic sync does not inspect terminal states.
Rationale: otherwise a merged issue could retain a stale review label forever after leaving Colin’s tracked handoff states.
Date/Author: 2026-03-30 / Codex