Blog - AI thought leadership series

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form. Please complete fields and try again...

AI thought leadership series 2026

we'll break it all down - AI tech, tactics, and practical steps

Post 1: AI is not new. But what's happening now is terrifyingly different.

[interface] screenshot of cybersecurity dashboard interface (for an ai cybersecurity company)

Most people think AI just got good recently.
Wrong.
We’re not living in the AI arrival. We’re living in the aftermath of a breakthrough that happened in 2017.
Here’s what changed, and how it’s impacting a lot.

Yep, we’ve left the realm of fancy autocomplete.

We’re watching models:
• Generate full working software
• Understand visual context
• Speak fluently across languages• Operate tools autonomously

AI’s been hiding in plain sight for years, powering everything from credit card fraud detection to GPS.
The shift came when we stopped writing rules and let models learn patterns on their own.

🤖 𝗪𝗵𝗮𝘁 𝘄𝗲'𝗿𝗲 𝘁𝗮𝗹𝗸𝗶𝗻𝗴 𝗮𝗯𝗼𝘂𝘁:
LLMs don’t store facts.
They recognise patterns across language, data, and visuals.
And make shockingly accurate predictions.
It doesn’t “know.”
It still predicts and is just weirdly good at it.
‍
This leap didn’t happen overnight. Let’s rewind how we got here👇

📜 𝗣𝗿𝗲-𝟮𝟬𝟭𝟬: 𝗥𝘂𝗹𝗲𝘀 & 𝗦𝘁𝗮𝘁𝗶𝘀𝘁𝗶𝗰𝘀
Rigid models, endless if-then logic.
Machine Learning for statistical prediction, think cluster models, decision trees, etc.
Keyword matching ruled the day. Narrow, brittle, and dumb.

💡 𝟮𝟬𝟭𝟬–𝟮𝟬𝟭𝟲: 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸𝘀 & 𝗗𝗲𝗲𝗽 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴
Models began to learn representations.
AlphaGo beats Lee Sedol using reinforcement learning.
Still narrow. Impractical at scale. But promising.

💥 𝟮𝟬𝟭𝟳: 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀
The real breakthrough. An architecture that understands relationships between words.
Attention replaces recurrence, unlocking scalability.
This is the moment everything started shifting.

⚙️ 𝟮𝟬𝟭𝟴–𝟮𝟬𝟮𝟬: 𝗟𝗟𝗠𝘀, 𝗚𝗔𝗡𝘀 & 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗠𝗼𝗱𝗲𝗹𝘀
Train once on larger and larger data sets.
Text, images, audio, video through GANs (Generative Adversarial Networks) and Diffusion models.
AI becomes general-purpose.

🌏 𝟮𝟬𝟮𝟬–𝟮𝟬𝟮𝟮: 𝗣𝘂𝗯𝗹𝗶𝗰 𝗕𝗿𝗲𝗮𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵
ChatGPT-3.5. AI moves from “cool” to “can’t ignore".
Strong zero-shot performance, now without task-specific training.
Reinforcement learning from human feedback becomes standard.
Audio models go multilingual. Image models sharpen with Diffusion.

🚀 𝟮𝟬𝟮𝟯–𝗻𝗼𝘄: 𝗘𝗺𝗲𝗿𝗴𝗲𝗻𝗰𝗲 & 𝗦𝗰𝗮𝗹𝗲
Models start doing things they weren’t trained to do. This is where it gets scary:)
Tool use with MCP. Workflow orchestration. Process automation.
Broad capabilities with even broader application.
We’re not just prompting anymore, we’re delegating.

Post 2: AI didn't get better gradually. It crossed a Threshold.

If you’re trying to make sense of why AI progress feels chaotic, here’s the uncomfortable truth:
It’s not linear.
Large Language Models didn’t slowly improve year by year.
They crossed thresholds.

The history of LLMs is defined by step changes, not slow improvement.
For years, models looked… underwhelming.
Then suddenly, sometimes after months of training, they “get it”.
‍
This phenomenon is called 𝗴𝗿𝗼𝗸𝗸𝗶𝗻𝗴.
‍
For a long time, a model behaves like this:
• It memorises examples
• Outputs are brittle
• Performance feels inconsistent or random

🧠Then the model stops memorising. 𝗜𝘁 𝘀𝘁𝗮𝗿𝘁𝘀 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲.
Not facts. But relationships.
‍
Grokking is one of the clearest examples of why AI progress feels unpredictable. It's a sharp, non-linear jump in capability. A model can appear stuck or mediocre, then abruptly transition from surface-level pattern matching to deep structural understanding. It moves from overfitting to generalisation and its capability spikes almost vertically. Not because it learned new rules, but because it learned the relationships.

In early 2022, researchers formally described this behaviour in paper “𝘎𝘳𝘰𝘬𝘬𝘪𝘯𝘨: 𝘎𝘦𝘯𝘦𝘳𝘢𝘭𝘪𝘻𝘢𝘵𝘪𝘰𝘯 𝘉𝘦𝘺𝘰𝘯𝘥 𝘖𝘷𝘦𝘳𝘧𝘪𝘵𝘵𝘪𝘯𝘨 𝘰𝘯 𝘚𝘮𝘢𝘭𝘭 𝘈𝘭𝘨𝘰𝘳𝘪𝘵𝘩𝘮𝘪𝘤 𝘋𝘢𝘵𝘢𝘴𝘦𝘵𝘴”. They observed models that showed no meaningful improvement on unseen data, until suddenly they did. When capability doesn’t improve how you expect it to.

🚨 𝗧𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗺𝗼𝗿𝗲 𝘁𝗵𝗮𝗻 𝗺𝗼𝘀𝘁 𝗼𝗿𝗴𝗮𝗻𝗶𝘀𝗮𝘁𝗶𝗼𝗻𝘀 𝗿𝗲𝗮𝗹𝗶𝘀𝗲.
Because AI pilots don’t behave like traditional software.
• Early underperformance doesn’t always predict final value
• Breakthroughs often arrive without warning
• Safety, governance, and control frameworks must assume future behaviours, not current ones, and this is essential before moving to production.

This is why “wait and see” is a risky strategy.
And why treating AI as just another tool upgrade misses the point.

AI doesn’t evolve like people.
That’s what makes it powerful.
And risky.
And deeply fascinating.

Post 3: Why Prompt Engineering won't save you AI strategy

Let’s be honest.
‍
Most “AI failures” aren’t model failures.
They’re use case and data governance failures.

If AI keeps hallucinating, the solution is not: “Write a better prompt.”
It is: “Give it clear data to work with and set boundaries”

𝗣𝗿𝗼𝗺𝗽𝘁 = how the model should think
𝗖𝗼𝗻𝘁𝗲𝘅𝘁 = what the model can think about
𝗠𝗼𝗱𝗲𝗹 = how well it can think

Prompting shapes behaviour, where context determines truth.
What 𝗽𝗿𝗼𝗺𝗽𝘁𝗶𝗻𝗴 does really well:
• Behaviour: “You are a senior risk consultant advising the CEO.”
• Structure: “Return JSON with fields: Risk, Impact, Mitigation.”
• Tasks: “First classify, then generate SQL, then explain.”

And where it fails:
• Missing information: “Assume the client uses SAP S/4HANA.”
• Forcing accuracy: “Only answer if you are 100% sure.”
• Domain expertise: “Here’s 20 pages of IFRS 17…”

Prompting controls how the model behaves. Not what it knows.
Context is the power for factual grounding, not for instructions or behaviour.

What 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 does really well:
• Facts: “Here are 6 months of incident logs…”
• Structure: Schemas, process models, policies.
• Tool output: SQL results, APIs, metrics.

Context is what turns LLMs from generic into specific. Performing in a known environment.
From toy to tool. From chatbot into decision engine. From demo to production.

How to get that context?

𝗥𝗔𝗚 (Retrieval Augmented Generation) pulls in external documents at runtime and injects them into the model’s context. It’s ideal for large or fast-changing knowledge, like searching policy documents or answering questions over recent research. But if retrieval fails, the model confidently makes things up.

𝗞𝗔𝗚 (Knowledge Augmented Generation) integrates structured knowledge graphs into reasoning. Instead of guessing from text, it reasons over known entities and relationships. Perfect for compliance, diagnostics, or domains where logic beats language.

𝗖𝗔𝗚 (Cache Augmented Generation) skips retrieval entirely and injects data straight into the context. It’s fast, cheap, and reliable, but only works when your knowledge is stable and fits in the context window, good answers for static data like manuals or internal policies.

𝗛𝘆𝗯𝗿𝗶𝗱 systems are common with CAG for core rules, RAG for real-time facts, and KAG for reasoning over domain logic.

AI projects fail from over-investments in prompts and under-investments in data pipelines. And its hybrid models that deliver, with the precision and agility where needed.

Post 4: From bigger models to better behaviour

Training made AI smart. Tuning made it useful.

We’re going through a change in model progression. It used to be about throwing larger data sets at them. Now improvements come from how to behave better, and when to think harder.

This is a shift, not just because we’ve run out of training data, but because people’s expectations changed. We’ve gone beyond summarising or document creation. It’s about automating what we do, from concept to implementation, interaction to autonomy.

In above posts, we covered:
• Transformers as the start of AI adoption
• Grokking, the sudden jump in capability
• And the value of context and prompting

Now how models start to bring business value👇

🧠 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘃𝘀 𝗣𝗼𝘀𝘁-𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴
Pre-training creates capability. The generalist reasonably good at everything… and occasionally wrong with confidence.
Post-training is where models learn how to follow instructions, when to refuse, and what “good” looks like.

Reinforcement Learning (RL) with human feedback (original post-training) is what made ChatGPT helpful, modern AI combines:

⚖️𝗗𝗶𝗿𝗲𝗰𝘁 𝗣𝗿𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗢𝗽𝘁𝗶𝗺𝗶𝘀𝗮𝘁𝗶𝗼𝗻
• Show pairs of answers to the same prompt
• Humans (/rules) select the better one
• The model is trained on preferred output probability
RL is slow & expensive; DPO is simple & fast.

📊 𝗚𝗿𝗼𝘂𝗽 𝗥𝗲𝗹𝗮𝘁𝗶𝘃𝗲 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘀𝗮𝘁𝗶𝗼𝗻
Pairwise comparisons don’t scale when:
• Answers are complex
• Multiple dimensions matter
• You care about relative quality
So not A vs B, but ranking groups. Good for optimising trajectories.

🧪𝗚𝗿𝗮𝗱𝗲𝗿𝘀 & 𝗩𝗲𝗿𝗶𝗳𝗶𝗲𝗿𝘀
• Graders score against criteria: “Did it have a source?”, “Does the code compile?”
• Verifiers check correctness after generation: reasoning steps, constraint checks.
Graders for quality, smarter routing for a step-based approach and validation. All without touching the base model.

⚡𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲-𝗧𝗶𝗺𝗲 𝗦𝗰𝗮𝗹𝗶𝗻𝗴
Thinking harder when needed. Generating many answers, verifying, and selecting. It avoids retraining, is easy to govern, and only hard cases require more funding.

There you go, 𝗳𝗼𝘂𝗿 𝗺𝗲𝘁𝗵𝗼𝗱𝘀 to make modern models more trustworthy.

🚨 𝗧𝗵𝗲 𝘂𝗻𝗰𝗼𝗺𝗳𝗼𝗿𝘁𝗮𝗯𝗹𝗲 𝗹𝗲𝘀𝘀𝗼𝗻
• Fine-tuning doesn’t fix unclear requirements
• Most AI issues are measurement failures, not model issues
• If humans can’t agree what “good” looks like, neither can the model

We’re not training intelligence anymore. We’re creating incentive systems.
Not what it can do, what it is rewarded to do.

🎯 𝗦𝘁𝗲𝗽𝘀
1. Define what is “good” (explicitly)
2. Encode it in graders & evaluations
3. Route to cheap models (limit reasoning)
4. Freeze a regression set (not to break)
5. Train as last step

AI doesn’t optimise for:
• truth
• usefulness
• safety
• customer happiness

It optimises for what you score, rank, reward, or select!
‍

Post 5: The moment we realised AI needed hands

It gave great answers, but nothing really happened…

Well, that’s how LLMs started. Now it's ChatGPT Agent & Operator, Claude Cowork, Copilot Studio, or any AI workflow orchestration tool. LLMs have received the ability to act (𝘥𝘰 instead of just 𝘴𝘢𝘺 things).

🤖𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀
Where traditional LLMs answer questions, the AI Agent plans, executes, checks, and adapts. Agentic AI? The digital workers that behave like teammates, not as another tool.

This requires:
• 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴: Break complex goals into small tasks
• 𝗠𝗲𝗺𝗼𝗿𝘆: Track past steps and results
• 𝗧𝗼𝗼𝗹 𝗨𝘀𝗲: Call APIs, trigger systems, act on data
• 𝗚𝗼𝗮𝗹 𝗢𝗿𝗶𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻: Stay focused, avoid distractions

Agents plan paths, execute steps, check own result, and adapt. This represents a shift from a tool you talk to, into a system that works with you.
However, effectiveness depends on a valid use case and solid design (incl. data it can handle, clear constraints, etc.).

If Agentic AI is the 𝗯𝗿𝗮𝗶𝗻 that decides to act, tool use is the 𝗵𝗮𝗻𝗱𝘀 to perform the action. To be truly useful in business context, AI must be able to interact with the real world by querying databases, triggering workflows, or calling APIs.

📡𝗠𝗖𝗣
Model Context Protocol was introduced by Anthropic in 2024 and has quickly become the standard for connecting systems. It’s an open protocol that leverages JSON for communication between tools, data sources, and systems.

Prior, every AI + tool integration was custom. Now MCP tells AI what tools exist, what they do, and when and how to use them.

MCP is infrastructure (not hype), a way for model-agnostic AI integration. The MCP server works as tool provider. The MCP Client is the LLM host that:• Discovers available tools dynamically• Understands inputs/outputs via schemas• Calls tools without hard-coded logic

❌ 𝗕𝗲𝗳𝗼𝗿𝗲 𝗠𝗖𝗣
• Every LLM had its own tool API
• Strict coupling between model & tools
• Re-implementation of the same integration over and over
• Difficult to swap LLMs• And brittle if anything changed

✅ 𝗔𝗳𝘁𝗲𝗿 𝗠𝗖𝗣
• Build a connector once
• Swap LLMs without rewriting
• Integration that’s easy to configure and change
• Agents become maintainable

At Intellifold, we run our own MCP server to create a seamless bridge between our platform, the client’s data model, and LLM. This makes sure the AI isn’t “guessing” or returning generic answers. Instead, it works directly with the data it has access to.

The LLM reads the context, transforms business questions into queries, the MCP allows execution, and the LLM provides structured responses.From tough business questions and changing dashboards…to creating new solutions and improvement recommendations.

All made possible through a smart design, understanding system data, and years of improving business processes.

Sign up for a free trial!