AI Workflows can display confidence scores and AI reasoning during workflow execution and moderation.
These features help teams better understand:
how certain the AI is about generated output
why the AI generated specific results
which products may require manual review
Confidence scoring and reasoning improve transparency and help maintain quality control inside automated workflows.
What are confidence scores?
Confidence scores indicate how certain the AI is about the generated result.
Higher confidence usually means:
clear source data
strong contextual information
reliable extraction or generation
Lower confidence may indicate:
incomplete product data
ambiguous descriptions
weak source information
uncertain classifications
Confidence scores help reviewers identify which outputs may require closer inspection.
What is AI reasoning?
AI reasoning explains why the AI generated specific output.
Reasoning provides visibility into:
extraction logic
classification decisions
translation choices
generated content behavior
This helps moderators better understand how the AI reached a result.
Why confidence scores matter
Large workflows may process:
thousands of products
multiple languages
incomplete supplier catalogs
inconsistent product structures
Reviewing every product manually is often unrealistic.
Confidence scores help teams:
prioritize moderation effort
focus on uncertain outputs
automate high confidence results
improve workflow efficiency
Why AI reasoning matters
AI reasoning improves transparency and trust inside automation pipelines.
Without reasoning, reviewers only see the generated result.
With reasoning, reviewers also understand:
what information the AI used
how conclusions were made
why specific values were selected
This improves moderation quality and debugging capabilities.
Where confidence scores appear
Confidence scores may appear during:
Attribute Extraction
Content Enrichment
Translation
Category Mapping
Validation workflows
Quality scoring workflows
Scores are typically visible inside:
test results
moderation screens
workflow result views
Where AI reasoning appears
AI reasoning may be visible inside:
workflow test results
moderation interfaces
action result screens
Reasoning is often accessible by opening the generated result details.
Example confidence score
Example:
Flavor extraction:
Flavor → Salmon
Confidence score → 96%
This indicates the AI is highly certain the product contains salmon based flavor information.
Example AI reasoning
Example reasoning:
"The product description references salmon based dry cat food intended for adult cats, therefore Flavor was assigned to Salmon and Lifecycle to Adult."
This helps reviewers understand why the extraction was generated.
High confidence outputs
High confidence outputs often contain:
clear product descriptions
structured source data
strong contextual signals
unambiguous terminology
These outputs may require less manual review.
Low confidence outputs
Low confidence outputs may occur when:
supplier content is incomplete
descriptions are vague
products contain conflicting information
categories overlap heavily
translations lack context
These outputs usually benefit from manual moderation.
Using confidence scores during moderation
Moderators can use confidence scores to prioritize review work.
Example strategy:
High confidence outputs → lighter review
Medium confidence outputs → standard moderation
Low confidence outputs → detailed inspection
This helps scale moderation more efficiently.
Confidence scores are indicators, not guarantees
A high confidence score does not always mean the output is correct.
Similarly, low confidence does not always mean the output is wrong.
Confidence scores should be used as:
review indicators
prioritization tools
workflow guidance
not as absolute truth.
Improving confidence scores
Confidence scores often improve when:
source data becomes cleaner
prompts become more specific
workflows become more targeted
extracted attributes provide more context
Workflow optimization usually improves both confidence and output quality over time.
Improving AI reasoning quality
Better prompts often produce:
clearer reasoning
more transparent logic
stronger contextual explanations
Well structured workflows also improve reasoning reliability.
Common reasons for low confidence
Low confidence scores may be caused by:
missing descriptions
weak supplier content
incomplete attributes
vague product names
unclear category structures
Improving source data quality often improves workflow performance significantly.
Best practices for using confidence scores
Prioritize low confidence reviews
Focus moderation effort on:
uncertain outputs
edge cases
complex products
multilingual content
This improves operational efficiency.
Combine confidence with moderation
Confidence scoring works best when combined with:
human review
workflow moderation
prompt optimization
This creates safer automation pipelines.
Use category specific workflows
Different categories often produce different confidence behavior.
Examples:
Fashion products
Electronics
Pet food
Technical equipment
Focused workflows improve extraction reliability.
Improve prompts continuously
Weak prompts often produce:
lower confidence
vague reasoning
inconsistent output
Prompt optimization is an important part of workflow management.
Example moderation flow
Example:
A webshop uses Attribute Extraction to detect:
Flavor
Lifecycle
Workflow results:
Product A → 98% confidence
Product B → 54% confidence
Moderators focus manual review on Product B because the AI is less certain about the extraction.
Reasoning helps reviewers understand why the AI selected the generated values.
Why confidence scoring and reasoning are important
Confidence scores and AI reasoning help businesses:
scale moderation safely
improve transparency
reduce manual review effort
identify weak source data
improve workflow quality over time
These features make AI Workflows more understandable, controllable and scalable for large catalog operations.