Skip to main content

Confidence scores and reasoning

AI Workflows can display confidence scores and AI reasoning during workflow execution and moderation.

These features help teams better understand:

  • how certain the AI is about generated output

  • why the AI generated specific results

  • which products may require manual review

Confidence scoring and reasoning improve transparency and help maintain quality control inside automated workflows.


What are confidence scores?

Confidence scores indicate how certain the AI is about the generated result.

Higher confidence usually means:

  • clear source data

  • strong contextual information

  • reliable extraction or generation

Lower confidence may indicate:

  • incomplete product data

  • ambiguous descriptions

  • weak source information

  • uncertain classifications

Confidence scores help reviewers identify which outputs may require closer inspection.


What is AI reasoning?

AI reasoning explains why the AI generated specific output.

Reasoning provides visibility into:

  • extraction logic

  • classification decisions

  • translation choices

  • generated content behavior

This helps moderators better understand how the AI reached a result.


Why confidence scores matter

Large workflows may process:

  • thousands of products

  • multiple languages

  • incomplete supplier catalogs

  • inconsistent product structures

Reviewing every product manually is often unrealistic.

Confidence scores help teams:

  • prioritize moderation effort

  • focus on uncertain outputs

  • automate high confidence results

  • improve workflow efficiency


Why AI reasoning matters

AI reasoning improves transparency and trust inside automation pipelines.

Without reasoning, reviewers only see the generated result.

With reasoning, reviewers also understand:

  • what information the AI used

  • how conclusions were made

  • why specific values were selected

This improves moderation quality and debugging capabilities.


Where confidence scores appear

Confidence scores may appear during:

  • Attribute Extraction

  • Content Enrichment

  • Translation

  • Category Mapping

  • Validation workflows

  • Quality scoring workflows

Scores are typically visible inside:

  • test results

  • moderation screens

  • workflow result views


Where AI reasoning appears

AI reasoning may be visible inside:

  • workflow test results

  • moderation interfaces

  • action result screens

Reasoning is often accessible by opening the generated result details.


Example confidence score

Example:
Flavor extraction:

  • Flavor → Salmon

  • Confidence score → 96%

This indicates the AI is highly certain the product contains salmon based flavor information.


Example AI reasoning

Example reasoning:
"The product description references salmon based dry cat food intended for adult cats, therefore Flavor was assigned to Salmon and Lifecycle to Adult."

This helps reviewers understand why the extraction was generated.


High confidence outputs

High confidence outputs often contain:

  • clear product descriptions

  • structured source data

  • strong contextual signals

  • unambiguous terminology

These outputs may require less manual review.


Low confidence outputs

Low confidence outputs may occur when:

  • supplier content is incomplete

  • descriptions are vague

  • products contain conflicting information

  • categories overlap heavily

  • translations lack context

These outputs usually benefit from manual moderation.


Using confidence scores during moderation

Moderators can use confidence scores to prioritize review work.

Example strategy:

  • High confidence outputs → lighter review

  • Medium confidence outputs → standard moderation

  • Low confidence outputs → detailed inspection

This helps scale moderation more efficiently.


Confidence scores are indicators, not guarantees

A high confidence score does not always mean the output is correct.

Similarly, low confidence does not always mean the output is wrong.

Confidence scores should be used as:

  • review indicators

  • prioritization tools

  • workflow guidance

not as absolute truth.


Improving confidence scores

Confidence scores often improve when:

  • source data becomes cleaner

  • prompts become more specific

  • workflows become more targeted

  • extracted attributes provide more context

Workflow optimization usually improves both confidence and output quality over time.


Improving AI reasoning quality

Better prompts often produce:

  • clearer reasoning

  • more transparent logic

  • stronger contextual explanations

Well structured workflows also improve reasoning reliability.


Common reasons for low confidence

Low confidence scores may be caused by:

  • missing descriptions

  • weak supplier content

  • incomplete attributes

  • vague product names

  • unclear category structures

Improving source data quality often improves workflow performance significantly.


Best practices for using confidence scores

Prioritize low confidence reviews

Focus moderation effort on:

  • uncertain outputs

  • edge cases

  • complex products

  • multilingual content

This improves operational efficiency.


Combine confidence with moderation

Confidence scoring works best when combined with:

  • human review

  • workflow moderation

  • prompt optimization

This creates safer automation pipelines.


Use category specific workflows

Different categories often produce different confidence behavior.

Examples:

  • Fashion products

  • Electronics

  • Pet food

  • Technical equipment

Focused workflows improve extraction reliability.


Improve prompts continuously

Weak prompts often produce:

  • lower confidence

  • vague reasoning

  • inconsistent output

Prompt optimization is an important part of workflow management.


Example moderation flow

Example:
A webshop uses Attribute Extraction to detect:

  • Flavor

  • Lifecycle

Workflow results:

  • Product A → 98% confidence

  • Product B → 54% confidence

Moderators focus manual review on Product B because the AI is less certain about the extraction.

Reasoning helps reviewers understand why the AI selected the generated values.


Why confidence scoring and reasoning are important

Confidence scores and AI reasoning help businesses:

  • scale moderation safely

  • improve transparency

  • reduce manual review effort

  • identify weak source data

  • improve workflow quality over time

These features make AI Workflows more understandable, controllable and scalable for large catalog operations.

Did this answer your question?