January 31st, 2024 by inflectra
ai artificial intelligence chatgpt genai
This article outlines our understanding of the current state of the art for using artificial intelligence and machine learning in software testing and QA, and challenges/opportunities for our customers. We are focusing primarily on Generative AI (GenAI), but there are other AI/ML use cases that are very important in QA.
Disclaimer: As with any discussion around the ‘state of GenAI’, the capabilities improve daily, so what is written here may well be superseded in short order!
With AI, testers can now teach and train the Large Language Models (LLMs) to understand the application(s) being tested. As described in this Microsoft document, Teachable Agents are a way to get around the limitation of LLMs where conversational assistants (such as ChatGPT) “forget” their learnings in between sessions. From the document:
“Teachability addresses these limitations by persisting user teachings across chat boundaries in long-term memory implemented as a vector database. Instead of copying all of memory into the context window, which would eat up valuable space, individual memories (called memos) are retrieved into context as needed. This allows the user to teach frequently used facts and skills to the teachable agent just once, and have it recall them in later chats.”
So, the role of the tester is to write prompts for the AI to perform tasks and test the application for the tester. The tester identifies incorrect or incomplete answers and 'teach' the system by correcting its thoughts by providing some facts or examples. With this approach, fixing is immediate and testing becomes a matter of fine-tuning.
Previously test automation was seen mainly a form of “checking” or regression testing, i.e. preventing existing errors or known issues from reoccurring. The move from traditional “scripted” manual testing to free-form exploratory or session-based testing was seen as the better use of human testers, with the scripted tests being automated.
However, with AI, a lot of exploratory testing can be potentially automated. For example, in this paper on how to use GPT-4V for testing, it was found to be a viable solution for navigating an iOS device, selecting the right application and executing a series of prompted tasks:
(Reproduced from the paper referenced above)
In the example above. The GPT-4V bot was able to complete the tasks on the iOS device, provide screenshots as to the actions, and even generating textual reasoning as why they were completing the task in a specific way and whether they succeeded or failed at the task.
The role of the human tester in this scenarios would be to view the results, understand if it was correct and complete, and if not, tune/teach the model to improve the results.
From the Inflectra point of view it is realistic to expect that the "Play" button for a manual test in our Spira test management tool would just work without human interaction! Similarly, Inflectra hosts a software testing competition called the “SoftwareTestBowl”. At some point (just like have seen with Chess and Go), GPT will start winning in such testing competitions. This is a good example of how fast the AI world is moving.
Another major area that AI is changing is the creation of test scenarios themselves. For example, it will make sense to have an AI-based recorder that simply watches the user performing a task and generates an audio and/or textual description of the scenario.
This is already a valuable tool when conducting UX/UI usability interviews and assessments. The AI can look at the user performing a task and generate a summary transcript and list of tasks. The next step will be for the AI to be able to make recommendations based on common design patterns.
From the Inflectra point of view, one area of promising functionality is the ability to simply upload/draw the sketch of a proposed application into Spira and have Spira generate the full specification for it. For example, using the free LLaVA: Large Language and Vision Assistant and the Library Information System sample web application:
When you ask it: “This is a web software application. Can you generate the specification for it?” you get:
The image you've provided appears to be a screenshot of a web-based application interface for a library information system. Based on the visible elements, I can infer some of the features and functionalities that this application might have. Here's a speculative specification for the application:
Application Name: Library Information System
Purpose:
The Library Information System is a web-based application designed to manage and organize library resources, including books, authors, and genres. It aims to provide librarians and library staff with a user-friendly interface to catalog, search, and maintain their library's collection.Features:
- Book Management:
- Ability to add new books to the system.
- Fields for book details such as title, author, genre, and publication date.
- Option to edit book details.
- Fields for additional information like ISBN, publisher, and number of copies.
- Ability to mark books as "Out of Print" or "In Print."
- Author Management:
- Fields for author details such as name, birth date, and death date.
- Option to add multiple works by the same author.
- Ability to link authors to their works.
- Genre Management:
- Fields for genre details such as name and description.
- Option to add new genres.
- Ability to link genres to books.
- Search Functionality:
- Advanced search options to find books by title, author, genre, and other criteria.
- Ability to filter results by availability (in print/out of print).
- User Interface:
- Clean and intuitive layout for easy navigation.
- Responsive design for use on different devices.
- Clear and concise error messages for user feedback.
- Security:
- Secure login system for authorized users.
- Role-based access control to restrict certain actions to specific users.
- Data encryption for sensitive information.
- Database Integration:
- Backend database integration for efficient storage and retrieval of data.
- Ability to import and export data in standard formats.
- Reporting:
- Generating reports on book circulation, author popularity, and genre trends.
- Customizable reports to suit the needs of the library.
- User Support:
- Online help or FAQ section for user assistance.
- Contact form for user inquiries.
- Scalability:
- Designed to handle a large number of books, authors, and genres.
- Ability to add additional fields or modules as needed.
Technical Requirements:
- Operating System: Compatible with modern web browsers (e.g., Chrome, Firefox, Safari, Edge).
- Database: MySQL, PostgreSQL, or other relational database management systems.
- Server Requirements: Apache, Nginx, or other web servers capable of running PHP or similar scripting languages.
- Front-end Technologies: HTML, CSS, JavaScript, and possibly frameworks like React, Angular, or Vue.js for a responsive and interactive interface.
- Back-end Technologies: PHP, Python, Ruby on Rails, or similar server-side scripting languages for database interaction and business logic.
Development Process:
- Requirements gathering and analysis.
- Design and prototyping.
- Development and testing.
- Deployment and maintenance.
Conclusion:
The Library Information System is a comprehensive web application designed to streamline library operations, from cataloging to searching and reporting. It is intended to be user-friendly, secure, and scalable to meet the needs of libraries of various sizes.
From this structured description we can create in Spira a list of:
With additional prompting, we can generate derived items such as project risks and include them in the Spira project model.
Although (as we have seen), using AI can improve the productivity and efficiency of testing existing applications, GenAI itself requires testing, and that introduces new types of testing:
In section 3 we talk about AI generating scenarios, the next level is that AI will be able to take the same sketches and prototypes and develop the actual applications. These “fluid” applications will be capable of being “regenerated” every day as the requirements change. From Agile to DevOps (and CI/CD) to continual development.
Imagine that we want to create a pizza ordering app:
I.e. We can skip all the planning/requirements/test stages because they are done implicitly in few seconds. But then the next level could be:
However, how do we test these dynamic, AI-composed applications? Their functionality, structure and features will change continuously, making it difficult to know what’s the correct behavior. We need to do some regression... Somehow..!
There are new ways for reporting and analytics to be created using AI:
These are complex to test and make it more difficult to create test scenarios without context. The tester (either human or AI) needs to apply “reasoning” to determine if the system is behaving as desired.
When using AI to do testing, there are some limitations due to AI “laziness” that need to be considered:
Ironically, the latest GPT-4 models are expensive to use and unless the results from the prompts are cached, using them for testing is currently more expensive (in terms of number of tokens) than using human testers/designers.
However, that will change in the future.
The simpler GPT-3.5 Turbo models are currently cost-effective and can be used for more limited tasks, as we’re demonstrating already with the Inflectra Spira AI functionality that generates risks, test cases, scenarios and tasks using GPT-3.5.
To make GPT-4 models cost effective at present, you need to use token optimization strategies such as using the Teachable Agents and vector databases that we discussed in section 1.
As you can see in this article, we are witnessing a real-time revolution in how AI can help make software testing easier, while simultaneously creating new types of application to be tested, and new ways we need to think about testing. Clearly 2024 will be an interesting year in testing!
And if you have any questions, please email or call us at +1 (202) 558-6885