About
The monthly Belgium AI, ML & Computer Vision Meetup runs as a distributed event across dozens of AI communities in Europe and globally — the same programme streams simultaneously to multiple local chapters. The May edition brings four talks across computer vision pre-training methodology, agent evaluation frameworks, document AI, and AI infrastructure economics.
Organised by Voxel51 and Jimmy Guerrero. 157 registered attendees from 48 groups this edition. Free to attend — register in advance to get the Zoom link.
Programme
-
18:00 CEST
Welcome & IntroJimmy Guerrero — Voxel51
-
18:05 CEST
Concept-Aware Batch Sampling Improves Language-Image PretrainingAdhiraj Ghosh — University of Tübingen
-
18:35 CEST
Do Your Agents Actually Work? Measuring Skills and MCP in PracticeAdonai Vera — Voxel51
-
19:05 CEST
The Last Mile of OCR [in 2026]Ankit Khare — LandingAI
-
19:35 CEST
The Energy Layer of AI: Powering the Next Wave of InferenceMedi Naseri — LōD Technologies
Speakers
Adhiraj Ghosh
ELLIS PhD Student — University of Tübingen (Bethge Lab)
Adhiraj Ghosh is a first-year ELLIS PhD student at the University of Tübingen, working with Matthias Bethge. His research focuses on training data curation for vision-language models. He holds a BEng in Electrical and Electronics Engineering (Manipal Institute of Technology & SMU Singapore, 2016–2020) and an MSc in Machine Learning from Tübingen (2022–2024).
His talk presents CABS — Concept-Aware Batch Sampling. Current pretraining pipelines for CLIP-style vision-language models use static datasets filtered by model-based quality scores. These pipelines are offline (the dataset is fixed before training starts) and concept-agnostic (filtering doesn’t track what visual concepts the model is actually seeing). CABS addresses both: it uses a 128-million-image dataset called DataConcept, annotated with fine-grained concept composition, to build training batches on-the-fly that match target concept distributions. Two variants: CABS-DM for maximising concept coverage, CABS-FM for prioritising high object multiplicity. Consistent gains across 28 benchmarks for CLIP/SigLIP-style models, with improvements that transfer to downstream generative models including LLaVA.
Adonai Vera
Machine Learning Engineer & Developer Relations — Voxel51
Adonai Vera builds and explains computer vision and machine learning systems at Voxel51, with over seven years working with TensorFlow, Docker, and OpenCV across production ML projects. He focuses on developer tooling and making evaluation serious.
His talk addresses a gap that teams building agentic systems hit quickly: it’s easy to demo an agent doing something impressive; it’s hard to know whether it works reliably at scale. Adonai walks through how to use FiftyOne Skills and MCP (Model Context Protocol) to evaluate agent performance on real scenarios. The focus is on signals beyond final output quality — latency, token usage, tool-use patterns, and failure modes. The goal: move from “it seemed to work” to measurable, repeatable evaluation that helps teams build more reliable systems.
Ankit Khare
Developer Relations — LandingAI
Ankit Khare has built developer relations functions at a series of AI infrastructure companies: Rockset (real-time analytics, acquired by OpenAI), Twelve Labs (video intelligence, backed by Index Ventures and Radical Ventures), and Abacus.AI (AI assistant platform). Earlier: AI engineer at Third Insight, AI researcher at the LEARN Lab at UT-Arlington working on visual scene understanding and image captioning.
His talk is about where OCR still fails. Benchmarks say the problem is solved. Real-world enterprise document work says otherwise. Large tables, old scans, mixed-language documents, handwriting, complex multi-column layouts — this is where even the best-benchmarked models struggle. Ankit presents LandingAI’s Agentic Document Extraction (ADE): an approach that goes beyond OCR to treat document processing as an agentic pipeline problem. Covers the core ADE architecture, how to build document processing pipelines with the ADE API/SDK, how AI coding agents can help build for you, and how ADE structures output to give LLMs what they need to reason over complex documents.
Medi Naseri
Founder & CEO — LōD Technologies
Medi Naseri holds a PhD in Electrical Engineering specialising in control and power systems. He founded LōD Technologies to build energy-intelligent infrastructure for flexible data centres and the broader AI compute ecosystem. His work sits at the intersection of AI workloads and power systems engineering — an area that is moving from peripheral concern to critical constraint as inference at scale becomes the dominant AI cost.
His talk explores a shift that is underway but not yet widely understood: the AI industry has moved from training-dominated to inference-dominated economics, and inference at scale is always-on. Always-on agent workloads have very different energy profiles than training runs. Medi shares what his team has learned from R&D on making AI workloads grid-aware and energy-intelligent — dynamically optimising inference jobs against real-time grid constraints. The practical implication: infrastructure decisions made now will define AI’s energy footprint for a decade.
What These Talks Are About
The Training Data Problem for Vision-Language Models
CLIP and its successors (SigLIP, etc.) learn by matching image representations with text representations, trained on hundreds of millions of image-caption pairs scraped from the web. The implicit assumption in most training pipelines is that more data is better, and that quality filtering based on model-predicted scores is sufficient to ensure the model sees a good distribution of visual concepts.
The problem is that this filtering is static and concept-blind. You can filter for quality but still end up with a training set where certain visual concepts (say, outdoor scenes, human faces, common objects) are massively over-represented and others are almost absent. A model trained on this data will perform well on benchmarks drawn from the same distribution and poorly on anything else. CABS proposes a dynamic alternative: annotate your training pool at the concept level, then construct batches on-the-fly to match whatever concept distribution you want your model to learn from.
Model Context Protocol (MCP) — What It Is and Why Evaluation Matters
MCP (Model Context Protocol) is an open standard for connecting AI models to external tools, data sources, and services — essentially a standardised way of giving an agent access to capabilities beyond its training data. An agent using MCP can call APIs, query databases, run code, read files, and interact with external systems in a consistent way, regardless of which underlying model is driving the agent.
The evaluation challenge is that agents using MCP are not just generating text — they are making sequences of tool-use decisions, and those decisions have costs (latency, tokens, API calls) and failure modes that don’t show up in final-output quality metrics. Evaluating agents properly requires tracking the whole interaction: which tools were called, in what order, with what inputs, and whether the sequence was efficient and correct. This is what FiftyOne Skills enables — structured agent evaluation with visibility into the process, not just the output.
Why Document AI is Still Hard
Standard OCR (optical character recognition) converts text in images to machine-readable text. Modern deep learning-based OCR is very accurate on clean, well-structured documents. The challenge is that most enterprise documents are not clean and well-structured: they contain tables with merged cells, handwritten annotations, stamps overlapping printed text, columns that span different font sizes, diagrams interspersed with text blocks, and mixed languages. These are not edge cases — they are the majority of documents in real industrial document workflows.
Agentic Document Extraction reframes the problem: instead of treating a document as a collection of text regions to be transcribed, it treats document understanding as a reasoning task. The agent identifies document structure, interprets relationships between regions (this is a table header, this is a footnote referencing that table, this figure caption refers to the preceding image), and produces structured output that an LLM can reason over. The goal is not perfect OCR — it is actionable, structured document intelligence.
The Energy Constraint Nobody Was Talking About (Until Now)
During the deep learning revolution (2012–2022), the dominant energy story was about training: large models required enormous compute runs, and the carbon footprint of a single GPT-3-scale training run attracted justified criticism. Inference — actually running a trained model to produce outputs — was comparatively cheap and intermittent. This has changed.
With the shift to agentic, always-on AI systems, inference is no longer a batch operation. An agent running continuously at scale consumes energy 24/7. The aggregate energy demand from inference is beginning to rival training, and this is before the next wave of AI deployment in physical systems (robotics, autonomous vehicles, industrial IoT). The question of how to make inference workloads grid-aware — scheduling compute to follow renewable energy availability, right-sizing infrastructure to reduce idle draw — is transitioning from an environmental concern to an operational cost management challenge.
Attend
Organised by
Belgium AI Machine Learning and Computer Vision Meetup
A distributed monthly meetup running simultaneously across AI communities throughout Europe and globally, coordinated by Voxel51. Each edition brings 3–4 technical talks on current AI and computer vision research. The Belgium chapter is one of 48 groups that host the May edition.
Meetup Group ›