![[interface] screenshot of cybersecurity dashboard interface (for an ai cybersecurity company)](https://cdn.prod.website-files.com/6861eb5cbc40991900a5ee95/6971fbde0935e7038a45222e_AI_series_post1_generativeAI.jpg)
Most people think AI just got good recently.
Wrong.
Weโre not living in the AI arrival. Weโre living in the aftermath of a breakthrough that happened in 2017.
Hereโs what changed, and how itโs impacting a lot.
Yep, weโve left the realm of fancy autocomplete.
Weโre watching models:
โข Generate full working software
โข Understand visual context
โข Speak fluently across languagesโข Operate tools autonomously
AIโs been hiding in plain sight for years, powering everything from credit card fraud detection to GPS.
The shift came when we stopped writing rules and let models learn patterns on their own.
๐ค ๐ช๐ต๐ฎ๐ ๐๐ฒ'๐ฟ๐ฒ ๐๐ฎ๐น๐ธ๐ถ๐ป๐ด ๐ฎ๐ฏ๐ผ๐๐:
LLMs donโt store facts.
They recognise patterns across language, data, and visuals.
And make shockingly accurate predictions.
It doesnโt โknow.โ
It still predicts and is just weirdly good at it.
โ
This leap didnโt happen overnight. Letโs rewind how we got here๐
๐ ๐ฃ๐ฟ๐ฒ-๐ฎ๐ฌ๐ญ๐ฌ: ๐ฅ๐๐น๐ฒ๐ & ๐ฆ๐๐ฎ๐๐ถ๐๐๐ถ๐ฐ๐
Rigid models, endless if-then logic.
Machine Learning for statistical prediction, think cluster models, decision trees, etc.
Keyword matching ruled the day. Narrow, brittle, and dumb.
๐ก ๐ฎ๐ฌ๐ญ๐ฌโ๐ฎ๐ฌ๐ญ๐ฒ: ๐ก๐ฒ๐๐ฟ๐ฎ๐น ๐ก๐ฒ๐๐๐ผ๐ฟ๐ธ๐ & ๐๐ฒ๐ฒ๐ฝ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด
Models began to learn representations.
AlphaGo beats Lee Sedol using reinforcement learning.
Still narrow. Impractical at scale. But promising.
๐ฅ ๐ฎ๐ฌ๐ญ๐ณ: ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ๐
The real breakthrough. An architecture that understands relationships between words.
Attention replaces recurrence, unlocking scalability.
This is the moment everything started shifting.
โ๏ธ ๐ฎ๐ฌ๐ญ๐ดโ๐ฎ๐ฌ๐ฎ๐ฌ: ๐๐๐ ๐, ๐๐๐ก๐ & ๐๐ถ๐ณ๐ณ๐๐๐ถ๐ผ๐ป ๐ ๐ผ๐ฑ๐ฒ๐น๐
Train once on larger and larger data sets.
Text, images, audio, video through GANs (Generative Adversarial Networks) and Diffusion models.
AI becomes general-purpose.
๐ ๐ฎ๐ฌ๐ฎ๐ฌโ๐ฎ๐ฌ๐ฎ๐ฎ: ๐ฃ๐๐ฏ๐น๐ถ๐ฐ ๐๐ฟ๐ฒ๐ฎ๐ธ๐๐ต๐ฟ๐ผ๐๐ด๐ต
ChatGPT-3.5. AI moves from โcoolโ to โcanโt ignore".
Strong zero-shot performance, now without task-specific training.
Reinforcement learning from human feedback becomes standard.
Audio models go multilingual. Image models sharpen with Diffusion.
๐ ๐ฎ๐ฌ๐ฎ๐ฏโ๐ป๐ผ๐: ๐๐บ๐ฒ๐ฟ๐ด๐ฒ๐ป๐ฐ๐ฒ & ๐ฆ๐ฐ๐ฎ๐น๐ฒ
Models start doing things they werenโt trained to do. This is where it gets scary:)
Tool use with MCP. Workflow orchestration. Process automation.
Broad capabilities with even broader application.
Weโre not just prompting anymore, weโre delegating.
![[interface] screenshot of cybersecurity dashboard interface (for an ai cybersecurity company)](https://cdn.prod.website-files.com/6861eb5cbc40991900a5ee95/6971fc33efb484320c665436_AI_series_post2_grokking.jpg)
If youโre trying to make sense of why AI progress feels chaotic, hereโs the uncomfortable truth:
Itโs not linear.
Large Language Models didnโt slowly improve year by year.
They crossed thresholds.
The history of LLMs is defined by step changes, not slow improvement.
For years, models lookedโฆ underwhelming.
Then suddenly, sometimes after months of training, they โget itโ.
โ
This phenomenon is called ๐ด๐ฟ๐ผ๐ธ๐ธ๐ถ๐ป๐ด.
โ
For a long time, a model behaves like this:
โข It memorises examples
โข Outputs are brittle
โข Performance feels inconsistent or random
๐ง Then the model stops memorising. ๐๐ ๐๐๐ฎ๐ฟ๐๐ ๐๐ป๐ฑ๐ฒ๐ฟ๐๐๐ฎ๐ป๐ฑ๐ถ๐ป๐ด ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ.
Not facts. But relationships.
โ
Grokking is one of the clearest examples of why AI progress feels unpredictable. It's a sharp, non-linear jump in capability. A model can appear stuck or mediocre, then abruptly transition from surface-level pattern matching to deep structural understanding. It moves from overfitting to generalisation and its capability spikes almost vertically. Not because it learned new rules, but because it learned the relationships.
In early 2022, researchers formally described this behaviour in paper โ๐๐ณ๐ฐ๐ฌ๐ฌ๐ช๐ฏ๐จ: ๐๐ฆ๐ฏ๐ฆ๐ณ๐ข๐ญ๐ช๐ป๐ข๐ต๐ช๐ฐ๐ฏ ๐๐ฆ๐บ๐ฐ๐ฏ๐ฅ ๐๐ท๐ฆ๐ณ๐ง๐ช๐ต๐ต๐ช๐ฏ๐จ ๐ฐ๐ฏ ๐๐ฎ๐ข๐ญ๐ญ ๐๐ญ๐จ๐ฐ๐ณ๐ช๐ต๐ฉ๐ฎ๐ช๐ค ๐๐ข๐ต๐ข๐ด๐ฆ๐ต๐ดโ. They observed models that showed no meaningful improvement on unseen data, until suddenly they did. When capability doesnโt improve how you expect it to.
๐จ ๐ง๐ต๐ถ๐ ๐บ๐ฎ๐๐๐ฒ๐ฟ๐ ๐บ๐ผ๐ฟ๐ฒ ๐๐ต๐ฎ๐ป ๐บ๐ผ๐๐ ๐ผ๐ฟ๐ด๐ฎ๐ป๐ถ๐๐ฎ๐๐ถ๐ผ๐ป๐ ๐ฟ๐ฒ๐ฎ๐น๐ถ๐๐ฒ.
Because AI pilots donโt behave like traditional software.
โข Early underperformance doesnโt always predict final value
โข Breakthroughs often arrive without warning
โข Safety, governance, and control frameworks must assume future behaviours, not current ones, and this is essential before moving to production.
This is why โwait and seeโ is a risky strategy.
And why treating AI as just another tool upgrade misses the point.
AI doesnโt evolve like people.
Thatโs what makes it powerful.
And risky.
And deeply fascinating.
![[interface] screenshot of cybersecurity dashboard interface (for an ai cybersecurity company)](https://cdn.prod.website-files.com/6861eb5cbc40991900a5ee95/6971fc4707eeaf2d5516c41a_AI_series_post3_promtcontext.jpg)
Letโs be honest.
โ
Most โAI failuresโ arenโt model failures.
Theyโre use case and data governance failures.
If AI keeps hallucinating, the solution is not: โWrite a better prompt.โ
It is: โGive it clear data to work with and set boundariesโ
๐ฃ๐ฟ๐ผ๐บ๐ฝ๐ = how the model should think
๐๐ผ๐ป๐๐ฒ๐
๐ = what the model can think about
๐ ๐ผ๐ฑ๐ฒ๐น = how well it can think
Prompting shapes behaviour, where context determines truth.
What ๐ฝ๐ฟ๐ผ๐บ๐ฝ๐๐ถ๐ป๐ด does really well:
โข Behaviour: โYou are a senior risk consultant advising the CEO.โ
โข Structure: โReturn JSON with fields: Risk, Impact, Mitigation.โ
โข Tasks: โFirst classify, then generate SQL, then explain.โ
And where it fails:
โข Missing information: โAssume the client uses SAP S/4HANA.โ
โข Forcing accuracy: โOnly answer if you are 100% sure.โ
โข Domain expertise: โHereโs 20 pages of IFRS 17โฆโ
Prompting controls how the model behaves. Not what it knows.
Context is the power for factual grounding, not for instructions or behaviour.
What ๐ฐ๐ผ๐ป๐๐ฒ๐
๐ does really well:
โข Facts: โHere are 6 months of incident logsโฆโ
โข Structure: Schemas, process models, policies.
โข Tool output: SQL results, APIs, metrics.
Context is what turns LLMs from generic into specific. Performing in a known environment.
From toy to tool. From chatbot into decision engine. From demo to production.
How to get that context?
๐ฅ๐๐ (Retrieval Augmented Generation) pulls in external documents at runtime and injects them into the modelโs context. Itโs ideal for large or fast-changing knowledge, like searching policy documents or answering questions over recent research. But if retrieval fails, the model confidently makes things up.
๐๐๐ (Knowledge Augmented Generation) integrates structured knowledge graphs into reasoning. Instead of guessing from text, it reasons over known entities and relationships. Perfect for compliance, diagnostics, or domains where logic beats language.
๐๐๐ (Cache Augmented Generation) skips retrieval entirely and injects data straight into the context. Itโs fast, cheap, and reliable, but only works when your knowledge is stable and fits in the context window, good answers for static data like manuals or internal policies.
๐๐๐ฏ๐ฟ๐ถ๐ฑ systems are common with CAG for core rules, RAG for real-time facts, and KAG for reasoning over domain logic.
AI projects fail from over-investments in prompts and under-investments in data pipelines. And its hybrid models that deliver, with the precision and agility where needed.
![[interface] screenshot of cybersecurity dashboard interface (for an ai cybersecurity company)](https://cdn.prod.website-files.com/6861eb5cbc40991900a5ee95/6979e3789a450a50d283bb12_AI_thought_leadership_4_AI_ModelTraining.jpg)
Training made AI smart. Tuning made it useful.
Weโre going through a change in model progression. It used to be about throwing larger data sets at them. Now improvements come from how to behave better, and when to think harder.
This is a shift, not just because weโve run out of training data, but because peopleโs expectations changed. Weโve gone beyond summarising or document creation. Itโs about automating what we do, from concept to implementation, interaction to autonomy.
In above posts, we covered:
โข Transformers as the start of AI adoption
โข Grokking, the sudden jump in capability
โข And the value of context and prompting
Now how models start to bring business value๐
๐ง ๐ง๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐๐ ๐ฃ๐ผ๐๐-๐ง๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด
Pre-training creates capability. The generalist reasonably good at everythingโฆ and occasionally wrong with confidence.
Post-training is where models learn how to follow instructions, when to refuse, and what โgoodโ looks like.
Reinforcement Learning (RL) with human feedback (original post-training) is what made ChatGPT helpful, modern AI combines:
โ๏ธ๐๐ถ๐ฟ๐ฒ๐ฐ๐ ๐ฃ๐ฟ๐ฒ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐ข๐ฝ๐๐ถ๐บ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป
โข Show pairs of answers to the same prompt
โข Humans (/rules) select the better one
โข The model is trained on preferred output probability
RL is slow & expensive; DPO is simple & fast.
๐ ๐๐ฟ๐ผ๐๐ฝ ๐ฅ๐ฒ๐น๐ฎ๐๐ถ๐๐ฒ ๐ฃ๐ผ๐น๐ถ๐ฐ๐ ๐ข๐ฝ๐๐ถ๐บ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป
Pairwise comparisons donโt scale when:
โข Answers are complex
โข Multiple dimensions matter
โข You care about relative quality
So not A vs B, but ranking groups. Good for optimising trajectories.
๐งช๐๐ฟ๐ฎ๐ฑ๐ฒ๐ฟ๐ & ๐ฉ๐ฒ๐ฟ๐ถ๐ณ๐ถ๐ฒ๐ฟ๐
โข Graders score against criteria: โDid it have a source?โ, โDoes the code compile?โ
โข Verifiers check correctness after generation: reasoning steps, constraint checks.
Graders for quality, smarter routing for a step-based approach and validation. All without touching the base model.
โก๐๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ-๐ง๐ถ๐บ๐ฒ ๐ฆ๐ฐ๐ฎ๐น๐ถ๐ป๐ด
Thinking harder when needed. Generating many answers, verifying, and selecting. It avoids retraining, is easy to govern, and only hard cases require more funding.
There you go, ๐ณ๐ผ๐๐ฟ ๐บ๐ฒ๐๐ต๐ผ๐ฑ๐ to make modern models more trustworthy.
๐จ ๐ง๐ต๐ฒ ๐๐ป๐ฐ๐ผ๐บ๐ณ๐ผ๐ฟ๐๐ฎ๐ฏ๐น๐ฒ ๐น๐ฒ๐๐๐ผ๐ป
โข Fine-tuning doesnโt fix unclear requirements
โข Most AI issues are measurement failures, not model issues
โข If humans canโt agree what โgoodโ looks like, neither can the model
Weโre not training intelligence anymore. Weโre creating incentive systems.
Not what it can do, what it is rewarded to do.
๐ฏ ๐ฆ๐๐ฒ๐ฝ๐
1. Define what is โgoodโ (explicitly)
2. Encode it in graders & evaluations
3. Route to cheap models (limit reasoning)
4. Freeze a regression set (not to break)
5. Train as last step
AI doesnโt optimise for:
โข truth
โข usefulness
โข safety
โข customer happiness
It optimises for what you score, rank, reward, or select!
โ
![[interface] screenshot of cybersecurity dashboard interface (for an ai cybersecurity company)](https://cdn.prod.website-files.com/6861eb5cbc40991900a5ee95/6979e382681da0f35927922d_AI_thought_leadership_5_AI_Agents_Tools.jpg)
It gave great answers, but nothing really happenedโฆ
Well, thatโs how LLMs started. Now it's ChatGPT Agent & Operator, Claude Cowork, Copilot Studio, or any AI workflow orchestration tool. LLMs have received the ability to act (๐ฅ๐ฐ instead of just ๐ด๐ข๐บ things).
๐ค๐๐ ๐๐ด๐ฒ๐ป๐๐
Where traditional LLMs answer questions, the AI Agent plans, executes, checks, and adapts. Agentic AI? The digital workers that behave like teammates, not as another tool.
This requires:
โข ๐ฅ๐ฒ๐ฎ๐๐ผ๐ป๐ถ๐ป๐ด: Break complex goals into small tasks
โข ๐ ๐ฒ๐บ๐ผ๐ฟ๐: Track past steps and results
โข ๐ง๐ผ๐ผ๐น ๐จ๐๐ฒ: Call APIs, trigger systems, act on data
โข ๐๐ผ๐ฎ๐น ๐ข๐ฟ๐ถ๐ฒ๐ป๐๐ฎ๐๐ถ๐ผ๐ป: Stay focused, avoid distractions
Agents plan paths, execute steps, check own result, and adapt. This represents a shift from a tool you talk to, into a system that works with you.
However, effectiveness depends on a valid use case and solid design (incl. data it can handle, clear constraints, etc.).
If Agentic AI is the ๐ฏ๐ฟ๐ฎ๐ถ๐ป that decides to act, tool use is the ๐ต๐ฎ๐ป๐ฑ๐ to perform the action. To be truly useful in business context, AI must be able to interact with the real world by querying databases, triggering workflows, or calling APIs.
๐ก๐ ๐๐ฃ
Model Context Protocol was introduced by Anthropic in 2024 and has quickly become the standard for connecting systems. Itโs an open protocol that leverages JSON for communication between tools, data sources, and systems.
Prior, every AI + tool integration was custom. Now MCP tells AI what tools exist, what they do, and when and how to use them.
MCP is infrastructure (not hype), a way for model-agnostic AI integration. The MCP server works as tool provider. The MCP Client is the LLM host that:โข Discovers available tools dynamicallyโข Understands inputs/outputs via schemasโข Calls tools without hard-coded logic
โ ๐๐ฒ๐ณ๐ผ๐ฟ๐ฒ ๐ ๐๐ฃ
โข Every LLM had its own tool API
โข Strict coupling between model & tools
โข Re-implementation of the same integration over and over
โข Difficult to swap LLMsโข And brittle if anything changed
โ
๐๐ณ๐๐ฒ๐ฟ ๐ ๐๐ฃ
โข Build a connector once
โข Swap LLMs without rewriting
โข Integration thatโs easy to configure and change
โข Agents become maintainable
At Intellifold, we run our own MCP server to create a seamless bridge between our platform, the clientโs data model, and LLM. This makes sure the AI isnโt โguessingโ or returning generic answers. Instead, it works directly with the data it has access to.
The LLM reads the context, transforms business questions into queries, the MCP allows execution, and the LLM provides structured responses.From tough business questions and changing dashboardsโฆto creating new solutions and improvement recommendations.
All made possible through a smart design, understanding system data, and years of improving business processes.
![[interface] screenshot of cybersecurity dashboard interface (for an ai cybersecurity company)](https://cdn.prod.website-files.com/6861eb5cbc40991900a5ee95/6979ece1d1176ebd6f93b5d0_MCP_Intellifold.jpg)
Gain full process transparency, spot inefficiencies instantly, and drive automation with real-time analytics and AI-powered monitoring across your business operations.
