Back to projects

Designing Trustworthy CUIs for Reliable Information Retrieval

A design thinking study exploring how conversational interfaces can be designed to support reliable, AI-driven information retrieval in quality management systems, with a focus on transparency and user trust.

  • M.Sc. Thesis
  • Design Thinking
  • User Research
  • Usability Testing
  • Prototyping
  • Figma
Role
UX Researcher & Designer
Timeline
Autumn 2025
External Partner
AM System

Problem

Quality management systems store critical compliance documentation, but finding the right information can be slow and dependent on prior system knowledge. Folder-based navigation and free-text search often assume users already know what they're looking for.

A conversational interface offers an alternative. But in a compliance-driven domain, the design question isn't whether it works. It's whether users can trust what it surfaces.

Process

The design thinking process and its five phases empathize, define, ideate, prototype and test.
The design thinking process and its five phases empathize, define, ideate, prototype and test.
  1. Empathize: Interviews

    Semi-structured interviews were conducted with 3 end users and 2 AM System employees to understand current workflows, pain points, and attitudes toward AI-assisted search.

  2. Define: Thematic Analysis & User Need Statements

    A thematic analysis of the interview data resulted in 6 themes and 7 user need statements. Key themes included transparency enables trust, reliability through metadata, and efficiency as a motivator.

  3. Ideate: HMW Questions & Brainstorming

    The user need statements were reframed into How Might We questions to create actionable design challenges. The ideation was conducted individually due to project constraints, which focused the output but limited the breadth of ideas explored. Solutions centered on source visibility, metadata display, role-based filtering, and multiple document formats.

  4. Prototype: LoFi → HiFi in Figma

    Two prototype iterations were created in Figma. LoFi testing revealed issues with navigation, visual density, and response length, which were addressed in the HiFi iteration. AI responses were simulated through Figma rather than a live implementation, which kept the focus on interface design and trust independent of AI variability.

  5. Test: Two rounds of moderated usability testing

    Two rounds of moderated usability testing were conducted. The LoFi prototype was tested with 4 AM System employees, and the HiFi prototype with 4 end users using think-aloud protocol, SUS, and S-TIAS questionnaires.

Prototype

The start view of the HiFi prototype, showing a conversational search interface with role-based suggested questions and a search input field.
Start view with role-based suggested questions and a search input
A conversation view showing the system asking for more context, a generated response with a source card containing document metadata, and a related document suggestion.
Generated response with source card and related document
The original document view of a routine, showing document metadata including creator, approver, and dates, alongside the full document content.
Original document view with full metadata
The summary view of a document, showing collapsible sections for each part of the routine.
Summary view with collapsible sections for quick navigation
A flowchart view of an assembly instruction, showing an information node, a step node, and a control point node connected by arrows.
Flowchart view of an assembly instruction
A presentation view of an assembly instruction, showing a step progress bar at the top and the current step content below.
Presentation view, a step-by-step instruction format

Results

The HiFi prototype received a SUS score of 90 (an 'A' grade indicating excellent perceived usability) and a trust score of 5.50 out of 7 on the S-TIAS scale. All four participants completed tasks without difficulty and described natural-language search as intuitive and familiar.

But the numbers only tell part of the story. The more interesting finding was how users interacted with the generated answers: not as a final authority, but as an entry point.

90

SUS Score — System Usability Scale

Grade A • Excellent Usability

5.50

Trust Score — S-TIAS

High level of perceived trust in the prototype.

Users treated the generated answer as a starting point, not the conclusion.

Every participant opened the source card at least once after receiving a response, often immediately, before finishing reading the answer itself. They weren't looking for the AI to be right. They were looking for a faster way to reach the document they could verify themselves.

As one participant put it: "Without the source card, I would not have felt like I could trust the response."

Trust was not built through the idea of an AI-driven system. It was built through transparency.

What made users trust the system was the ability to verify information by accessing the source document and validating the metadata: creator, approver, version, and date. The conversational interaction itself was familiar and efficient, but it wasn't what created trust. Interface design, specifically the source card, document metadata, and one-click access to the original document, had an impact on perceived trust.

Read the full thesis(opens in new tab)