Strict DSL Optional AI Playwright Web-based pipeline from spec → runnable automation → artifacts

Write test intent once. Get Playwright code, runs, and artifacts.

This is an enterprise-grade, web-based test automation framework centered around a strict FlowStep DSL. You submit a spec, the system parses it into a validated plan, generates Playwright tests (TS or JS), and can optionally execute the job while streaming logs and collecting artifacts (report, traces, screenshots, video).

What you get (end-to-end)

Specs that stay readable FlowStep supports strict syntax and more concise forms, so humans can review intent quickly.

Real Playwright output Generated tests are standard Playwright specs, plus config defaults that can be overridden per job.

Job model + traceability Input → parsed blocks → normalized test plan → code generation → execution → artifacts.

Artifacts you can debug with Logs via SSE, Playwright HTML report, and optional trace/video/screenshot retention policies.

See the flow See the DSL Deploy options

Example input

CONFIG
base_url | https://example.com
headless | true
trace | on-first-retry
video | retain-on-failure
screenshot | only-on-failure
ENDCONFIG

DATA
email | user@example.com
password | $SECRET:PASSWORD
ENDDATA

TEST | Login Smoke
S001 | NAVIGATE | /
S002 | FILL | role=textbox name="Email" | ${email}
S003 | FILL | role=textbox name="Password" | ${password}
S004 | CLICK | role=button name="Sign in"
S005 | EXPECT_VISIBLE | text="Welcome"
ENDTEST

Secrets are resolved from environment variables at runtime (not stored in the spec).

What it is

A monorepo application (web + API + shared contracts) that turns test specs into deterministic plans, then into Playwright tests, and optionally into executed runs with durable artifacts.

FlowStep DSL parsing CONFIG (optional), DATA (optional), TEST blocks (required), step commands, locator syntax, and variable resolution all validated into a normalized TestPlan.

Playwright generation Generates Playwright specs (TypeScript or JavaScript). Defaults can come from env and be overridden in the spec’s CONFIG block.

Job execution service Runs are modeled as jobs with a workspace directory, log streaming (SSE), and artifact serving (including the Playwright HTML report).

Critical boundary (no marketing fiction)

This tool generates Playwright code and can execute it as a job. If you want “push to repo / auto-commit” behavior, that would be an additional feature on top of what’s here. What you do get today is code output you can review and commit using your normal Git workflow.

How it works (request → artifacts)

The important part is the pipeline discipline. The system keeps the intermediate artifacts, so failures are explainable, not mysterious.

Create a job

Submit steps text, choose TS/JS, and decide whether to run immediately.

Optional: interpret natural language

If enabled, NL lines can be converted into FlowStep DSL using a configured LLM provider.

Parse DSL → blocks

CONFIG/DATA/TEST blocks are parsed into a structured representation.

Validate + normalize → TestPlan

Steps become a normalized plan with resolved variables, validated command shapes, and consistent semantics.

Optional: locator enrichment

For ambiguous intent, enrichment can produce a deterministic selection trace for complex target choices.

Generate Playwright code

Emit runnable Playwright spec(s) and supporting files in a job workspace.

Optional: execute

Run Playwright. Stream logs via Server-Sent Events. Capture outputs based on configured retention.

Serve artifacts

Fetch logs, parsed plan, artifacts, and the Playwright HTML report for review and debugging.

Why this matters

A lot of “rapid automation” tools hide the intermediate decisions. This one makes the pipeline explicit: input, parse, normalize, generate, run, artifacts. That’s what keeps speed from turning into chaos.

FlowStep DSL (strict, but practical)

FlowStep supports different levels of explicitness, plus a locator syntax designed for Playwright selectors (roles, text, CSS, XPath).

Syntax comparison (same intent, different strictness)

// Concise (inference)
S001 FILL | "Email" | user@example.com
S002 CLICK | "Login"

// More explicit
S001 FILL | "Email" textbox | user@example.com
S002 CLICK | "Login" button

// Strict (full selector control)
S001 | FILL | role=textbox name="Email" | user@example.com
S002 | CLICK | role=button name="Login"

Variables can be referenced as ${name}. Secrets are $SECRET:NAME (resolved from env).

What the DSL covers

CONFIG defaults (browser/headless/timeouts/workers/retries/trace/video/screenshot, etc.)
DATA values and secret references
TEST blocks with step commands and validation
Locator syntax suitable for Playwright selector strategies
Optional decision trace for complex or ambiguous selection intent

Natural language steps (accepted patterns)

Natural language lines (without pipe syntax) can be interpreted into FlowStep DSL when an LLM provider is enabled. The intent must still be clear and test-oriented. These are examples of supported patterns.

Accepted natural language examples

Navigate to the login page

Fill in the Email field with user@example.com

Type the password into the Password textbox

Click the Sign In button

Check the Remember me checkbox

Wait for the dashboard to load

Verify that "Welcome" is visible

Expect the error message "Invalid credentials"

Select "Canada" from the Country dropdown

Each line must express a single, clear test action or assertion.

How they are converted internally

// Natural language
Click the Sign In button

// Converted FlowStep
S004 | CLICK | role=button name="Sign In"

// Natural language
Verify that "Welcome" is visible

// Converted FlowStep
S005 | EXPECT_VISIBLE | text="Welcome"

// Natural language
Fill in the Email field with user@example.com

// Converted FlowStep
S002 | FILL | role=textbox name="Email" | user@example.com

The conversion produces strict DSL, which then follows the same deterministic pipeline as manually written FlowStep.

Important constraint

Natural language is interpreted into structured FlowStep before validation and generation. The system does not execute raw English. It executes validated DSL. That separation keeps AI-assisted input from turning into unpredictable automation.

LLM support (optional, controlled)

LLMs are used in two distinct places: converting natural language into FlowStep DSL, and enriching locators/decisions. You can run fully local (Ollama) or remote (OpenAI-compatible), and restrict allowed remote base URLs.

Mode 1: NL → DSL interpretation Convert plain-English steps into FlowStep. The output is structured and reviewable before generation.

Mode 2: locator enrichment When intent is complex, enrichment can produce a deterministic selection command sequence and an “AI decision trace.”

Security posture Remote API keys remain server-side. Remote base URLs can be allowlisted to prevent exfiltration to arbitrary hosts.

Practical takeaway

Use LLMs where they save time, but keep the system grounded in deterministic outputs: DSL, validated plan, and standard Playwright code.

Architecture (monorepo)

The project is split into a backend service, a frontend service, and shared contracts/schemas. Jobs and artifacts live under a workspace directory.

apps/api (backend) Job creation, parsing/validation, optional LLM services, code generation, execution, SSE log streaming, artifacts serving.

apps/web (frontend) UI for creating jobs, selecting LLM provider modes, viewing plans, streaming logs, and accessing reports/artifacts.

packages/shared Shared types/contracts and schemas to keep API ↔ UI aligned and reduce drift.

Runtime notes

Workspace dir holds job inputs, generated code, logs, and artifacts.
Jobs are durable units: they keep parsed plan, run outputs, and downloadable reports.
Suites can group multiple specs and run them as jobs.

Deploy a self-contained Playwright repo (TS or JS)

In this app, “deployment” means more than generating a test file. You can choose an output format (Playwright + TypeScript or Playwright + JavaScript), then deploy the generated framework into a selected external repository root. The result is a self-contained Playwright environment where the generated scripts can be run immediately.

Pick the target format Generate Playwright tests as either TypeScript or JavaScript. The app produces the matching project setup so the repo stays consistent with the chosen language.

Select an external repo root Choose a destination repository root (your existing repo or a fresh folder). The app deploys the generated Playwright project structure into that location.

Ready-to-run Playwright environment The deployed output contains everything needed to run Playwright: config, dependencies, and a sane folder layout for tests and artifacts.

What gets deployed

package.json with Playwright dependencies and scripts (install / test / report)
Playwright config (playwright.config.ts for TS or playwright.config.js for JS)
Test folder layout (tests/ or similar) containing generated specs
Optional helpers / fixtures / shared utilities needed by the generated output
Artifacts/report output directories (trace/video/screenshots + HTML report)
Optional README stub in the deployed repo with “run” instructions

How deployment works (conceptually)

1) Input spec (FlowStep + optional NL)
2) Parse + validate → TestPlan
3) Generate Playwright tests (TS or JS)
4) Assemble a complete repo scaffold for the selected format
5) Write files into the selected external repository root
6) Result: `npm install` + `npx playwright install` + `npm test`

This is “deployment” as an output deliverable: a runnable repo, not just generated code.

Why this matters

It removes the usual setup tax. Instead of “here’s a test file, now wire Playwright,” you get a complete runnable repository skeleton that can be committed, reviewed, and executed in CI.

See the full pipeline FlowStep DSL API-first workflow

Disclaimer: This app is entirely owned by Emil Zimmermann. Any use of this app must be authorized by the owner.