Skip to content

Agentic Workflows

The Flakiness CLI ships with skills — structured documentation packages that teach AI coding agents how to query and analyze your test data. Once installed, an agent can investigate flaky tests, find regressions, and build FQL queries on its own.

AgentSkill directory
Claude Code.claude/skills/
Codex.codex/skills/
Cursor.cursor/skills/
Terminal window
flakiness skills install --agent <claude|codex|cursor>

This installs the flakiness-investigation skill that teaches the agent how to use the Flakiness CLI to query test data. The skill covers flakiness list tests with all its options, FQL filter syntax, and common investigation recipes.

Once installed, you can ask your agent things like:

  • “Fix my PR’s failing tests” (uses --pr to fetch failures from a specific pull request)
  • “Find the most flaky tests in our project”
  • “Show me all regressions in the e2e/ directory”
  • “Which tests have been failing with timeout errors?”

The agent will translate your request into the appropriate flakiness list tests command with the right FQL filters.

Restart your agent after installation to pick up new skills.

The most common agentic workflow is fixing test failures in a pull request. With the skill installed, just ask your agent:

Fix the failing tests in PR #42 in myorg/myproject

The agent will:

  1. Run flakiness list tests --project myorg/myproject --pr 42 --fql 's:regressed' to find tests that the PR broke (tests passing on the target branch but failing in the PR)
  2. Read the reported file paths and error messages
  3. Make targeted code fixes
  4. Ignore failed tests (pre-existing failures on the target branch) and flaked tests (passed on retry)

This works with any supported agent — Claude Code, Codex, or Cursor.

You can set up a scheduled GitHub Actions workflow that uses Claude Code to automatically investigate and fix flaky tests.

This workflow:

  1. Runs on a schedule (e.g. every Monday at 9 AM)
  2. Uses GitHub OIDC to authenticate with Flakiness.io — no secrets needed
  3. Runs Claude Code in non-interactive mode (-e) with the flakiness skill to find flaky tests, investigate root causes, and open a PR with fixes
.github/workflows/deflake.yml
name: Deflake Tests
on:
schedule:
- cron: '0 9 * * 1' # Every Monday at 9:00 UTC
workflow_dispatch: {} # Allow manual trigger
permissions:
contents: write
pull-requests: write
id-token: write
jobs:
deflake:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Flakiness CLI
run: curl -LsSf https://cli.flakiness.io/install.sh | sh
- name: Install Flakiness skill
run: flakiness skills install --agent claude
- name: Install Claude Code
run: npm i -g @anthropic-ai/claude-code
- name: Deflake
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
FLAKINESS_PROJECT: myorg/myproject
run: |
claude -e "Use the flakiness skill to find the top 5 flakiest tests \
in our project. For each one, investigate the root cause in our \
codebase and open a PR with a fix. Use a separate PR per test."
  1. flakiness skills install --agent claude installs the flakiness-investigation skill so Claude Code knows how to use the CLI
  2. claude -e "..." runs Claude Code with a prompt in non-interactive mode
  3. Claude Code reads the installed skill, runs flakiness list tests --fql 'flip>0%' --sort flip_rate --sort-dir desc to find flaky tests, then investigates and fixes each one
  4. GitHub OIDC handles authentication transparently — the FLAKINESS_PROJECT env var tells the CLI which project to query

You can customize the prompt to focus on specific areas:

Terminal window
# Only investigate regressions in e2e tests
claude -e "Use the flakiness skill to find regressions in e2e/ files and fix them."
# Focus on slow tests
claude -e "Use the flakiness skill to find tests slower than 10s and optimize them."
# Investigate tests failing with a specific error
claude -e "Use the flakiness skill to find tests failing with 'timeout' errors and fix the root cause."