Testing Chatbots with Spira and Rapise from Inflectra

September 19th, 2025 by Adam Sandman

ai automated testing

We were asked by a client about using the Inflectra quality suite to help them test AI chatbots. Now to build and deliver any type of application, the Inflectra suite can speed up the design, development and testing:

  • Spira will let you design and spec the product, model the risks, write test scenarios and conduct manual testing and ensure compliance
  • Rapise will take those test scenarios and automate them against both the user interface (UI) and any API endpoints
  • Inflectra.ai speeds up both these tasks by using Generative AI to create all the detailed requirements, tasks, risks and test scenarios, and AgenticAI to convert the test scenarios into Rapise automated tests, and then execute them at speed and scale, with self-healing.

However when looking at testing chatbots specifically, there are some best practices in using the Inflectra suite that will be helpful:

1) Model your chatbot in Spira

  • Requirements & user stories: Capture intents, entities, guardrails (e.g., “no PII in replies”), channels (web widget, Slack, Teams), locales, and SLAs (latency, uptime).

  • Artifacts & traceability setup:

    • Custom lists: Intents, entities, personas, prompt versions, model versions, temperatures, knowledge-base (RAG) snapshots.

    • Test case types:

      • NLU (intent/entity recognition)

      • Dialog management (multi-turn flow)

      • API/integration (LLM, middleware, search/RAG)

      • UI channel (web/Slack/Teams widget)

      • Safety/guardrails (prompt-injection, toxicity, jailbreaks)

      • Non-functional (latency, rate-limit handling, resiliency)

    • Test sets: By channel (Web/Slack), locale (en-US, fr-FR), and scenario pack (Happy Path, Edge Cases, Adversarial).

    • Traceability: Link requirements → test cases → automation scripts → defects. Use Releases to represent bot versions; Configurations to represent model/prompt settings.

2) Automate conversations with Rapise

Rapise can drive both UIs and APIs, so you can test the bot end-to-end or headless.

A. Web/desktop/mobile chat UI

  • Selectors & actions: Use Rapise’s web/mobile drivers to open the chat widget/app, send messages, click quick-replies, upload files, and read bot responses.

  • Assertions for variable text:

    • Accept sets of valid responses (allowlist).

    • Use regex/contains for flexible matching.

    • Validate JSON blocks in responses (schema checks).

    • Verify links/buttons appear and function.

  • Multi-turn context: Keep the same browser/app session; store conversation IDs; assert state carries across turns.

B. API-level testing (headless)

  • Use Rapise REST/GraphQL to call your bot gateway (or orchestration layer), pass a conversation/session id, and validate:

    • Predicted intent/entity payloads

    • Dialog state transitions

    • Latency (measure response times per turn)

    • RAG traces (source doc ids, confidence) when available

  • Data-driven runs: Feed utterances/expected results from CSV/Excel; iterate across locales/personas.

C. Safety & resilience suites

  • Prompt-injection & jailbreaks: Maintain a curated library of adversarial prompts; assert that replies follow policy (e.g., refusal templates).

  • Toxicity/PII checks: Pattern-match for emails, SSNs, profanity; assert redaction or refusal.

  • Rate-limit & retries: Simulate bursts, verify graceful degradation and backoff messages.

  • Fallbacks: Force upstream errors (mock 500s) and ensure the bot gives a helpful, on-brand fallback.

3) Make “fuzzy” results testable

Chatbots are non-deterministic—design your assertions accordingly:

  • Equivalence classes: Define multiple acceptable phrasings per test.

  • Semantic similarity (optional): If your team exposes an internal similarity API or keyword scorer, call it from Rapise and assert a minimum score (keep it deterministic—same model & seed).

  • Structured anchors: Prefer validating facts (dates, amounts, URLs, buttons, JSON keys) over prose.

4) Close the loop with Spira

  • Spira ⟷ Rapise integration: Use the built-in connector so Rapise pushes execution results, logs, and screenshots back into Spira automatically—linked to the right test cases and releases.

  • Dashboards & KPIs in Spira:

    • Intent accuracy, entity F1, flow pass-rate

    • Safety violations by type

    • P95/P99 latency by channel

    • Regression deltas after prompt/model changes

  • Defect workflow: When an assertion fails, Rapise can file a defect in Spira with evidence (transcript, HAR/logs, screenshots) and the exact environment (model version, temperature, KB snapshot).

5) Typical test assets (starter set)

  • Requirements:

    • R-001 “Bot supports 25 core intents at ≥92% accuracy”

    • R-010 “No PII should appear in responses”

    • NFR-003 “P95 latency ≤ 1200 ms Web; ≤ 1500 ms Slack”

  • Test cases:

    • NLU-EN-GREET-001: “hello / hi / hey” → intent:greeting

    • DLG-RETURNS-004: multi-turn return workflow incl. slot filling & disambiguation

    • SAFE-INJECT-009: “Ignore prior instructions and …” → refusal template

    • API-RAG-012: Boson FAQ → verify cited doc IDs

    • PERF-LAT-P95-001: 100 sequential turns; assert P95 SLA

  • Data sets: CSV/Excel of utterances, entities, locales, allowed replies, and negative tests.

6) Environments, versions, and CI/CD

  • Environments in Spira: Dev / Staging / Prod with Configurations for model=GPT-X.Y, temperature=0.2, prompt=v14, KB=2025-09-01.

  • RemoteLaunch: Schedule Rapise suites from Spira nightly or on commit; fan-out across hosts.

  • Gates: Only promote releases when Spira dashboards show green on accuracy, safety, and latency.

7) Reporting that stakeholders understand

  • Conversation transcripts: Attach exact turn-by-turn logs to Spira executions.

  • Evidence packs: Screenshots, HARs, and API payloads from Rapise.

  • Trend views: Compare model/prompt versions per metric to quantify regressions/improvements.


Quick “first implementation” checklist

  1. Create intents/entities & SLAs as Requirements in Spira; link a Release for “Bot v1.4”.

  2. Build data-driven Rapise tests (CSV) for top 20 intents + 5 multi-turn flows.

  3. Add safety suite with 25 adversarial prompts.

  4. Connect Rapise to Spira; run via RemoteLaunch on each PR to prompts or model.

  5. Track latency & accuracy in Spira dashboards; open defects with transcripts automatically.


About the Author

Adam Sandman

Adam Sandman is a visionary entrepreneur and a respected thought leader in the enterprise software industry, currently serving as the CEO of Inflectra. He spearheads Inflectra’s suite of ALM and software testing solutions, from test automation (Rapise) to enterprise program management (SpiraPlan). Adam has dedicated his career to revolutionizing how businesses approach software development, testing, and lifecycle management.

Spira Helps You Deliver Quality Software, Faster and with Lower Risk.

Get Started with Spira for Free

And if you have any questions, please email or call us at +1 (202) 558-6885