June 8th, 2026 by Adam Sandman
Nordic Testing Days 2026 in Tallinn, Estonia was a timely reminder that the testing profession is not standing still. As artificial intelligence becomes embedded into products, workflows, chatbots, agents, and decision-support systems, the role of testers is expanding beyond traditional verification. We are increasingly being asked to provide something broader and more valuable: confidence.
The conference, held June 3-5, 2026, brought together testers, quality engineers, developers, AI practitioners, and technology leaders under the banner of a community event “by testers for testers.” The 2026 speaker lineup included practitioners from across the testing and quality ecosystem, including Kristel Kruustük, Nicole van Gijn, Jonathon Wright, and myself representing Inflectra.
For me, the experience started even before the conference opened. Tallinn itself provided a fitting backdrop for the conversations that followed. The city is a striking blend of old and new: medieval streets, Soviet-era structures, modern digital infrastructure, and a strong sense of technological identity. That contrast felt highly relevant to where software quality is today. We are taking decades of testing discipline and applying it to a new generation of systems that are probabilistic, adaptive, and often difficult to evaluate using traditional methods alone.
AI Is Not Replacing Testers. It Is Changing What We Test.
One of the strongest themes from the event was that AI is not making human judgment obsolete. Instead, it is moving human judgment further up the value chain.
Jonathon Wright opened the first conference day with a keynote on how AI may create more opportunities than ever for testers, centered around the emerging discipline of AI Confidence. That framing resonated throughout the event: testers are no longer only validating whether a system conforms to a specification; increasingly, they are helping organizations understand whether AI-enabled systems can be trusted in realistic, ambiguous, and changing conditions.
Kristel Kruustük captured a similar theme in her reflections from the event: humans are not being replaced, they are being “promoted.” Critical thinking, oversight, creativity, and confidence are becoming more important, not less.
That is a powerful message for the testing community. The discipline is evolving from finding defects in deterministic systems to evaluating behavior, risk, reliability, and trust in systems that may not produce the same output twice.
Rethinking Testing for Non-Deterministic Systems
My talk focused on one of the biggest challenges facing software teams today: how do you test AI agents, chatbots, and other non-deterministic systems?
Traditional test automation relies on predictable inputs and expected outputs. That model still matters, but it is not enough for AI systems. A chatbot may answer the same question differently depending on context. An AI agent may pursue a task through multiple possible paths. A workflow powered by a large language model may succeed, fail, hallucinate, leak data, ignore constraints, or produce a partially correct result that still creates business risk.
In my session, I discussed how teams can use an orchestrator pattern to systematically test AI behavior at scale. The core idea is to move beyond one-off prompt experiments and build a repeatable framework that combines:
- input agents to generate realistic and adversarial scenarios;
- judge agents to evaluate AI outputs against defined criteria;
- orchestration logic to manage the execution of tests;
- analytics to identify patterns, failure modes, and confidence levels.
This approach allows organizations to evaluate AI agents and chatbots more systematically, especially when exact-match assertions are too brittle or too simplistic. In the talk, I positioned this as part of a broader AI Assurance strategy: making sure AI systems are not only innovative, but safe, reliable, explainable, and fit for purpose.
LLM-as-Judge Is Useful, But It Needs Governance
One of the most practical takeaways from the conference was that LLM-as-judge techniques are becoming an important part of AI quality engineering. Used carefully, they can help teams evaluate tone, completeness, correctness, safety, policy compliance, and task success across a large number of AI interactions.
However, LLM judges should not be treated as magic. They require careful prompt design, calibration, review, and observability. Teams need to understand what the judge is evaluating, how consistent it is, where it may be biased, and when human review is still required.
That is why orchestration and analytics matter. The value is not just in asking one model to judge another. The value comes from turning AI evaluation into an engineered process: one that can be repeated, inspected, governed, and improved over time.
Testing AI Requires a New Language: Confidence Engineering
A recurring idea at Nordic Testing Days was that the language of quality is changing. “Testing” remains essential, but AI systems require us to talk more explicitly about confidence.
Confidence engineering is a useful way to describe this shift. It includes traditional testing, but also expands into areas such as:
- behavioral evaluation;
- safety testing;
- prompt-injection testing;
- data leakage assessment;
- AI governance;
- risk-based assurance;
- human oversight;
- production monitoring;
- continuous evaluation.
For business leaders, this matters because AI adoption is often slowed not by lack of ambition, but by lack of trust. Organizations want to use AI, but they need evidence that their systems behave reliably and safely before they can scale them into customer-facing or mission-critical workflows.
For testers and QA leaders, this creates a major opportunity. The skills that have always made great testers valuable - curiosity, skepticism, systems thinking, risk awareness, and user empathy - are exactly the skills needed to evaluate AI systems.
Livestreaming Confidence Engineering with Jonathon Wright
One of the highlights of the week was the chance to sit down with my good friend Jonathon Wright for a Confidence Engineering livestream live from Tallinn. We were also joined by Jason Arbon from Seattle, with Kristjan Karmo joining us for part of the conversation as well.
The livestream gave us a chance to continue the conversation beyond the formal conference sessions. We discussed AI Assurance, the changing role of testers, the need for practical evaluation frameworks, and how teams can move from enthusiasm around AI to evidence-based confidence in AI systems.
That is an important distinction. Many organizations are experimenting with AI. Fewer have a mature strategy for testing it. The next phase of enterprise AI adoption will depend on whether teams can demonstrate that AI systems are reliable enough, safe enough, and governed enough to be trusted.
Key Takeaways from Nordic Testing Days 2026
For me, five lessons stood out from the event.
- AI quality is becoming a mainstream testing concern. It is no longer a niche topic for AI researchers or data science teams. Testers, QA leaders, product teams, and business stakeholders all need to understand how AI systems behave and fail.
- Non-determinism changes the testing model. We cannot rely only on static expected results. We need evaluation frameworks that can handle variability, ambiguity, and probabilistic behavior.
- Testers are becoming confidence engineers. The profession is moving toward a broader mandate: helping organizations decide whether software, AI agents, and automated workflows are trustworthy enough to use.
- AI evaluation must be systematic. Ad hoc prompt testing is not enough. Teams need repeatable orchestration, representative test data, judge models, analytics, governance, and human review.
- Finally, the human role is becoming more important, not less. AI can generate tests, execute scenarios, analyze results, and assist with judgment. But humans still define risk, determine acceptability, interpret trade-offs, and decide what “good enough” means in a real business context.
From Testing Software to Assuring AI
Nordic Testing Days 2026 made it clear that the testing community has a central role to play in the next phase of AI adoption.
As AI agents and chatbots become more capable, they also become more difficult to evaluate using legacy methods alone. Enterprises will need practical ways to test these systems for correctness, robustness, safety, and trustworthiness. That is where AI Assurance comes in.
At Inflectra, this is an area we are deeply focused on: helping organizations manage requirements, risks, tests, governance artifacts, and execution frameworks in a way that supports confidence in both traditional and AI-enabled systems.
The future of testing is not smaller because of AI. It is bigger. It is more strategic. And it is more closely tied than ever to business trust.
Nordic Testing Days was a great reminder that this community is ready for that challenge.