Agentic Workflows

The Flakiness CLI ships with skills — structured documentation packages that teach AI coding agents how to query and analyze your test data. Once installed, an agent can investigate flaky tests, find regressions, and build FQL queries on its own.

Supported Agents

Agent	Skill directory
Claude Code	`.claude/skills/`
Codex	`.codex/skills/`
Cursor	`.cursor/skills/`

Installing Skills

flakiness skills install --agent <claude|codex|cursor>

This installs the flakiness-investigation skill that teaches the agent how to use the Flakiness CLI to query test data. The skill covers flakiness list tests with all its options, FQL filter syntax, and common investigation recipes.

Once installed, you can ask your agent things like:

“Fix my PR’s failing tests” (uses --pr to fetch failures from a specific pull request)
“Find the most flaky tests in our project”
“Show me all regressions in the e2e/ directory”
“Which tests have been failing with timeout errors?”

The agent will translate your request into the appropriate flakiness list tests command with the right FQL filters.

Restart your agent after installation to pick up new skills.

Example: Fix PR Tests

The most common agentic workflow is fixing test failures in a pull request. With the skill installed, just ask your agent:

Fix the failing tests in PR #42 in myorg/myproject

The agent will:

Run flakiness list tests --project myorg/myproject --pr 42 --fql 's:regressed' to find tests that the PR broke (tests passing on the target branch but failing in the PR)
Read the reported file paths and error messages
Make targeted code fixes
Ignore failed tests (pre-existing failures on the target branch) and flaked tests (passed on retry)

This works with any supported agent — Claude Code, Codex, or Cursor.

Example: Deflake Cron Job

You can set up a scheduled GitHub Actions workflow that uses Claude Code to automatically investigate and fix flaky tests.

This workflow:

Runs on a schedule (e.g. every Monday at 9 AM)
Uses GitHub OIDC to authenticate with Flakiness.io — no secrets needed
Runs Claude Code in non-interactive mode (-e) with the flakiness skill to find flaky tests, investigate root causes, and open a PR with fixes

name: Deflake Tests
on:
  schedule:
    - cron: '0 9 * * 1'  # Every Monday at 9:00 UTC
  workflow_dispatch: {}   # Allow manual trigger

permissions:
  contents: write
  pull-requests: write
  id-token: write

jobs:
  deflake:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Flakiness CLI
        run: curl -LsSf https://cli.flakiness.io/install.sh | sh

      - name: Install Flakiness skill
        run: flakiness skills install --agent claude

      - name: Install Claude Code
        run: npm i -g @anthropic-ai/claude-code

      - name: Deflake
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          FLAKINESS_PROJECT: myorg/myproject
        run: |
          claude -e "Use the flakiness skill to find the top 5 flakiest tests \
            in our project. For each one, investigate the root cause in our \
            codebase and open a PR with a fix. Use a separate PR per test."

How It Works

flakiness skills install --agent claude installs the flakiness-investigation skill so Claude Code knows how to use the CLI
claude -e "..." runs Claude Code with a prompt in non-interactive mode
Claude Code reads the installed skill, runs flakiness list tests --fql 'flip>0%' --sort flip_rate --sort-dir desc to find flaky tests, then investigates and fixes each one
GitHub OIDC handles authentication transparently — the FLAKINESS_PROJECT env var tells the CLI which project to query

You can customize the prompt to focus on specific areas:

# Only investigate regressions in e2e tests
claude -e "Use the flakiness skill to find regressions in e2e/ files and fix them."

# Focus on slow tests
claude -e "Use the flakiness skill to find tests slower than 10s and optimize them."

# Investigate tests failing with a specific error
claude -e "Use the flakiness skill to find tests failing with 'timeout' errors and fix the root cause."