How I use AI to write Playwright tests (without it getting out of hand)

A few months back a teammate sent me an “AI-generated” test suite to sign off on. All green, looked great. It took me ten minutes to notice that half of it tested nothing: clicks that never checked the result, selectors glued to the CSS that broke the moment someone touched the HTML, and a couple of expect calls that passed no matter what you did.

That’s the trap of asking the model to “write all the tests”. It writes them. And they look fine. The problem is that a test that doesn’t fail when something actually breaks is worse than no test at all, because it keeps you calm for no reason.

And still, I use AI with Playwright almost every day and it saves me a ton of time. The trick is in how you split the work.

I decide what gets tested, AI helps me write it

The part I never hand off is deciding what matters: which flows are critical, what happens if this breaks on a Friday at six, what “correct” even means here. That takes knowing the product and the business, and the model knows neither.

What I do hand off, once the case is clear, is the mechanical part. That’s where I actually save time:

I give it a manual test case (the steps and what should happen) and it returns a first Playwright draft that I then review and harden.
I ask it to pull repeated selectors out into page objects. It’s boring, repetitive work — exactly what I don’t feel like doing by hand.
When a test goes flaky, I paste the error and the trace. Nine times out of ten it’s a badly placed wait.

Selectors are where you can tell if someone knows what they’re doing

By default AI reaches for CSS or exact-text selectors, and those are the first to break. I always steer it toward the locators Playwright actually recommends: getByRole, getByLabel, getByText, and getByTestId when there’s no clear role. CSS or XPath only when there’s no other option.

One line in the prompt (“use getByRole and getByTestId, no fragile CSS selectors”) completely changes what you get back. Same with waits: ban waitForTimeout and ask for assertions that wait on their own, like expect(locator).toBeVisible(). That alone kills off half your flaky tests.

What I check before merging

I don’t care whether the AI wrote the test or I did. Before it goes in, it gets the same list:

Does it test something a real user does, or some internal detail nobody cares about? Would the assertion fail if the feature actually broke? Does it survive a layout change? Is it deterministic, with no sleeps or order dependencies? And the big one: do I understand it well enough to fix it six months from now, when it blows up and I don’t even remember writing this code?

If it doesn’t pass that list, it’s out. Doesn’t matter who wrote it.

So, the short version

AI doesn’t do my QA for me. It clears the tedious stuff off my plate so I can spend the time on the part that’s actually hard: thinking about risk, deciding what’s worth testing, and arguing it out with the team. Writing tests faster is nice. Writing the right tests is what counts, and that part is still on me.