<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:yandex="http://news.yandex.ru" xmlns:turbo="http://turbo.yandex.ru" xmlns:media="http://search.yahoo.com/mrss/">
  <channel>
    <title>Talks</title>
    <link>http://codepointconf.com</link>
    <description/>
    <language>ru</language>
    <lastBuildDate>Fri, 03 Apr 2026 01:53:17 +0300</lastBuildDate>
    <item turbo="true">
      <title>The Evolution of Product Management in the Age of AI*</title>
      <link>http://codepointconf.com/tpost/6d0o0nrx01-the-evolution-of-product-management-in-t</link>
      <amplink>http://codepointconf.com/tpost/6d0o0nrx01-the-evolution-of-product-management-in-t?amp=true</amplink>
      <pubDate>Sun, 15 Mar 2026 22:58:00 +0300</pubDate>
      <author>Artem Miloserdov</author>
      <enclosure url="https://static.tildacdn.com/tild3137-6334-4365-b661-383839663134/The_Evolution_of_PM_.jpg" type="image/jpeg"/>
      <description>Artem Miloserdov explores how AI is reshaping the Product Manager role — from prioritization specialist to strategic translator between business needs and AI capabilities, and why strong product thinking matters more than ever.</description>
      <turbo:content><![CDATA[<header><h1>The Evolution of Product Management in the Age of AI*</h1></header><figure><img alt="" src="https://static.tildacdn.com/tild3137-6334-4365-b661-383839663134/The_Evolution_of_PM_.jpg"/></figure><div class="t-redactor__text">This talk <strong>was not recorded</strong> and is not available on YouTube due to NDA restrictions.<br />The presentation is available <a href="https://drive.google.com/file/d/11VPhL7z2cFxbaIXygPxPplu_s-p2OX1Q/view?usp=sharing">here</a>.<br /><br />Artem Miloserdov focused on how the role of the Product Manager is changing as AI moves from a supporting tool to a core part of product strategy, execution, and decision-making. The talk explored how PMs are no longer operating only as prioritization and coordination specialists, but increasingly as translators between business needs, technical capabilities, and AI-enabled product opportunities.<br /><br />A key theme of the presentation was that AI is not simply adding another feature layer to digital products — it is changing the way products are imagined, built, and measured. In that context, the PM role is evolving as well. Discovery becomes faster, experimentation becomes cheaper, and teams can validate ideas earlier than before. At the same time, PMs now have to work with new uncertainties: non-deterministic systems, model limitations, quality trade-offs, and questions around trust, user experience, and business value.<br /><br />The talk also highlighted the tension between speed and judgment. While AI can accelerate research, prototyping, analysis, and documentation, it does not replace product thinking. PMs still need to define the actual problem, understand users, make trade-offs, and align product direction with the realities of the business. In many ways, AI increases the need for strong product leadership rather than reducing it.<br /><br />Overall, the session framed the modern PM as someone who must become more adaptive, more technical in understanding AI capabilities, and more strategic in deciding where AI genuinely creates value versus where it only creates noise. For product teams navigating AI adoption, the talk served as a strong reminder that the future of PM is not disappearing — it is being redefined.</div>]]></turbo:content>
    </item>
    <item turbo="true">
      <title>AI Is Fast. Infrastructure Is Not: The Latency Problem in Real-Time AI</title>
      <link>http://codepointconf.com/tpost/0ate24nl61-ai-is-fast-infrastructure-is-not-the-lat</link>
      <amplink>http://codepointconf.com/tpost/0ate24nl61-ai-is-fast-infrastructure-is-not-the-lat?amp=true</amplink>
      <pubDate>Sun, 15 Mar 2026 22:58:00 +0300</pubDate>
      <author>Amir Adigamov</author>
      <enclosure url="https://static.tildacdn.com/tild3737-3566-4430-b138-666630623465/The_Latency_Problem_.jpg" type="image/jpeg"/>
      <description>Amir Adigamov reveals why real-time AI still feels slow — tracing latency beyond the model to the full infrastructure stack, with practical fixes for voice, gaming, and XR applications.</description>
      <turbo:content><![CDATA[<header><h1>AI Is Fast. Infrastructure Is Not: The Latency Problem in Real-Time AI</h1></header><figure><img alt="" src="https://static.tildacdn.com/tild3737-3566-4430-b138-666630623465/The_Latency_Problem_.jpg"/></figure><div class="t-redactor__text">You can watch this talk on YouTube in <a href="https://youtu.be/J6qWuhS8PI8?si=gd1CdVO8XL-9bItJ&amp;t=1483">Russian</a> and <a href="https://youtu.be/SmSiL4LUo8w?si=TFsfcWlKdJ1ev3MY&amp;t=1484">English</a>(auto-dubbed).<br />The presentation is available <a href="https://drive.google.com/file/d/1lJ1TxNLK6X_KJwAymhj-OAFbQi5qAWUp/view?usp=sharing">here</a>.<br /><br /><br />Amir Adigamov’s talk focused on one of the most visible gaps in modern AI products: the models are getting faster and more powerful, but many real-time AI experiences still feel slow to the user. His presentation unpacked that contradiction and showed that the real issue often lies not in the model itself, but in the surrounding infrastructure stack.<br /><br />He began by defining what he meant by <strong>real-time AI</strong> — systems such as voice interfaces, game-related AI interactions, extended reality scenarios, or any user-facing experience where a person expects an immediate response. In those environments, even modest latency creates friction. The talk argued that while AI engines are improving at an extraordinary pace, the rest of the system often remains constrained by older architectural assumptions and slower data-handling layers.<br /><br />A major theme of the session was that AI inference is only one stage in a much larger pipeline. Between the user request and the first useful answer, a system may incur network latency, queueing delays, business logic overhead, retrieval and ranking steps, serialization costs, vector database search time, and token generation time. Amir broke down this “anatomy of latency” in detail, explaining how even when the inference engine is fast, the user may still experience substantial delay because of all the surrounding layers. This was especially relevant in retrieval-augmented systems, where pre-processing and retrieval often consume a surprising portion of the response budget before generation even begins.<br /><br />He also drew a distinction between <strong>time to first token</strong> and the rest of the generation process. For many systems, the first-token delay is the most important user experience threshold because it determines whether the interaction feels responsive. Once a system begins streaming output, users are more forgiving. But before that point, every step in the stack matters. The talk made it clear that optimizing real-time AI means optimizing the entire system path, not just the model.<br /><br />Another practical and well-received part of the session covered architectural choices that affect latency. Amir contrasted older patterns such as REST and JSON serialization with faster alternatives like gRPC, protobuf-based communication, and more efficient binary formats. He also discussed memory and hardware design considerations, especially around token generation bottlenecks, and showed why throughput and responsiveness can be constrained by memory access patterns as much as by compute.<br /><br />One of the strongest concrete examples in the talk involved voice AI architecture. He described a more traditional implementation in which the system waits for a user to finish speaking, then performs retrieval and ranking before responding. He contrasted that with a lower-latency architecture using a semantic cache, a fast-path agent for common requests, and a slower background process that prefetches likely relevant information while the user is still speaking. This kind of split architecture can dramatically reduce perceived latency by avoiding unnecessary waits and making the response path more proactive.<br /><br />The practical recommendations at the end of the talk tied these ideas together: keep the path to the model short, avoid unnecessary legacy overhead, keep warm state where it matters, use caching aggressively when possible, and stream responses early instead of waiting for the whole pipeline to complete. The broader message was clear: users do not interact with isolated models — they interact with systems. And if those systems are architected poorly, no model alone will save the experience.</div>]]></turbo:content>
    </item>
    <item turbo="true">
      <title>Hybrid Trust Systems: Combining AI and Physics for Critical Industrial Monitoring</title>
      <link>http://codepointconf.com/tpost/vpp8zmejt1-hybrid-trust-systems-combining-ai-and-ph</link>
      <amplink>http://codepointconf.com/tpost/vpp8zmejt1-hybrid-trust-systems-combining-ai-and-ph?amp=true</amplink>
      <pubDate>Sun, 15 Mar 2026 22:58:00 +0300</pubDate>
      <author>Andrii Syrotenko</author>
      <enclosure url="https://static.tildacdn.com/tild3839-6134-4335-b064-343162376132/Hybrid_Trust_Systems.jpg" type="image/jpeg"/>
      <description>Andrii Syrotenko explains why AI alone isn't enough for critical industrial environments — and how combining computer vision with physics-based validation creates more reliable, false-positive-resistant systems in oil, gas, and manufacturing.</description>
      <turbo:content><![CDATA[<header><h1>Hybrid Trust Systems: Combining AI and Physics for Critical Industrial Monitoring</h1></header><figure><img alt="" src="https://static.tildacdn.com/tild3839-6134-4335-b064-343162376132/Hybrid_Trust_Systems.jpg"/></figure><div class="t-redactor__text">You can watch this talk on YouTube in <a href="https://www.youtube.com/watch?v=J6qWuhS8PI8&amp;t=2570s">Russian</a> and <a href="https://youtu.be/SmSiL4LUo8w?si=ZFC08mh6IZhdUV24&amp;t=2567">English</a>(auto-dubbed).<br />The presentation is available <a href="https://docs.google.com/presentation/d/142PlR_sYF-WqV0j-MlLlKYJQzwstLdSX/edit?usp=sharing&amp;rtpof=true&amp;sd=true">here</a>.<br /><br />Andrii Syrotenko’s presentation addressed one of the most important questions in industrial AI: what do you do when AI alone is not trustworthy enough for critical environments? Drawing on experience from industrial analytics and computer vision in oil and gas, manufacturing, and aviation-related systems, he showed why real-world industrial monitoring cannot rely purely on model confidence scores or lab-trained behavior.<br /><br />A major part of the talk focused on the challenge of <strong>false positives</strong> in real industrial settings. Even strong computer vision models can perform well in controlled environments and still fail once they are exposed to changing weather, lighting, reflections, steam, background interference, or production-specific visual noise. In gas detection, for example, steam can resemble gas in spectral imagery; glare and reflections can trigger false alarms; and in manufacturing, even something as simple as an oil drop can be misclassified as a crack or defect. In critical infrastructure, those false positives are not merely inconvenient — they can stop production, overwhelm operators, and ultimately reduce trust in the system.<br /><br />That issue led directly to the core idea of the session: hybrid trust systems. Instead of depending on AI alone, Andrii described how his teams introduced a second validation layer based on <strong>physics and motion analysis</strong>. After the AI system detects a possible event, a lightweight physics-based pipeline checks whether the event is physically plausible. This includes tracking motion vectors across frames, analyzing optical flow, looking at direction and speed, and comparing what is observed against known physical behavior. For example, vapor rises in a certain way, gases do not move at impossible speeds, and certain visual artifacts behave differently from actual leaks or defects. When those physical constraints contradict the AI prediction, the confidence can be reduced before the alert reaches an operator.<br /><br />Another important point in the talk was that many industrial environments are constrained in ways typical cloud AI systems are not. These systems often run with limited hardware, restricted connectivity, and little or no internet access for security reasons. Models cannot always be continuously retrained in production, and organizations are often unable or unwilling to share sensitive operational data. Because of that, it is not realistic to assume that more data or more retraining will solve every problem. In such environments, combining AI with domain physics is not just elegant — it is practical.<br /><br />During the discussion, the audience asked which other industries could benefit from this kind of hybrid trust approach beyond the examples shown in the talk. Andrii’s answer was broad and important: essentially any domain where AI interacts with the physical world can benefit, especially where computer vision, timing, motion, or material behavior matter. While he mentioned that he was not deeply involved in autonomous driving himself, he pointed out that similar ideas are highly relevant there as well. The broader takeaway was that whenever the environment obeys physical laws and the consequences of mistakes are high, AI should be grounded by those laws rather than left to operate alone.</div>]]></turbo:content>
    </item>
    <item turbo="true">
      <title>Prompt Engineering: Better Results with Less Trial and Error</title>
      <link>http://codepointconf.com/tpost/d805e3eky1-prompt-engineering-better-results-with-l</link>
      <amplink>http://codepointconf.com/tpost/d805e3eky1-prompt-engineering-better-results-with-l?amp=true</amplink>
      <pubDate>Mon, 16 Mar 2026 01:45:00 +0300</pubDate>
      <author>Ali Kuzhuget</author>
      <enclosure url="https://static.tildacdn.com/tild3338-6431-4338-b865-616463633563/Prompt_Engineering.jpg" type="image/jpeg"/>
      <description>Ali Kuzhuget breaks down prompt engineering from basics to advanced techniques — showing how structured, intentional prompting improves output quality, reduces costs, and prevents hallucinations across text, image, video, and music generation.</description>
      <turbo:content><![CDATA[<header><h1>Prompt Engineering: Better Results with Less Trial and Error</h1></header><figure><img alt="" src="https://static.tildacdn.com/tild3338-6431-4338-b865-616463633563/Prompt_Engineering.jpg"/></figure><div class="t-redactor__text">You can watch this talk on YouTube in <a href="https://youtu.be/J6qWuhS8PI8?si=_CxgKP-Z6qzvdCll&amp;t=3503">Russian</a> and <a href="https://youtu.be/SmSiL4LUo8w?si=D6yrpdKQu5FXDXae&amp;t=3514">English</a>(auto-dubbed).<br />The presentation is available <a href="https://drive.google.com/file/d/17YG7Gt5UYbEHSGzrdbjiBhsm4QtYtbVW/view?usp=sharing">here</a>.<br /><br /><br />Ali Kuzhuget’s presentation focused on a foundational but often underestimated topic: prompt engineering. His central point was simple and practical — while many people treat prompting as something obvious or informal, mastering it can significantly improve output quality, reduce wasted iterations, save tokens, and lower the overall cost of working with AI systems.<br /><br />The talk covered several core prompting approaches, beginning with <strong>zero-shot prompting</strong>, where a user asks for something directly without giving examples or structure. Ali explained that while this is the most common way people interact with AI systems, it often produces weak or incomplete results because the model lacks sufficient context. From there, he moved to <strong>few-shot prompting</strong>, where examples are provided so the model can infer the desired pattern, and then to more guided reasoning approaches such as <strong>chain-of-thought</strong>, where the interaction unfolds through multiple steps and clarifications until the result becomes more precise.<br /><br />One of the strengths of the session was its emphasis on the fact that prompting is not universal across tools. Different models and platforms have different strengths, preferences, and output behavior. Ali used examples from image generation, music generation, video generation, and coding workflows to show that understanding the model’s “reading style” matters. A vague prompt such as “a man in a suit” may produce a generic result, while a much more structured prompt that specifies style, framing, lighting, background, camera angle, and intended output context can dramatically improve quality. The same logic applies to music, where genre, mood, structure, and musical language influence outcomes, and to video, where timing, motion continuity, and transitions become especially important.<br /><br />He also spent time on the practical cost of poor prompting. Bad prompts do not just produce disappointing outputs — they consume retries, burn tokens, and waste time. That is especially painful in multimodal systems where generation is more expensive and post-processing is harder. In video generation, for example, he pointed out that prompts need to anticipate what happens in the next step of the workflow, because abrupt movement or poor ending states make clips difficult to stitch together later. This was an important reminder that good prompt design often means thinking not only about the immediate output, but also about how that output will be used downstream.<br /><br />Another major theme of the talk was <strong>grounding</strong>. Ali connected prompt quality to hallucination reduction by emphasizing the importance of supplying reliable source material, system-level instructions, and structured context when factual accuracy matters. He referred to notebook-style and retrieval-grounded workflows as examples of how outputs become more stable when the model is anchored to defined information sources instead of improvising from its generic training patterns.<br /><br />He also briefly touched on model settings such as temperature and sampling behavior, noting that these influence creativity, variability, and factual stability. That helped place prompting in a broader context: good outputs depend not only on the words in the prompt, but also on the model’s configuration and intended use.<br /><br />The overall tone of the talk was practical and experience-driven. Ali openly reflected on the fact that trying to discover everything independently can be costly, and encouraged the audience to learn from official guides, platform tutorials, and established best practices rather than assuming prompting can be mastered purely through intuition. While the audience discussion was short, that practical framing clearly resonated: one of the immediate follow-up requests was for him to share the presentation so attendees could revisit the examples and prompting patterns afterward.</div>]]></turbo:content>
    </item>
    <item turbo="true">
      <title>When AI Says “Migrate” but the Business Says “Wait”: Value-Based Architecture in the Age of GenAI</title>
      <link>http://codepointconf.com/tpost/s3zdgbum51-when-ai-says-migrate-but-the-business-sa</link>
      <amplink>http://codepointconf.com/tpost/s3zdgbum51-when-ai-says-migrate-but-the-business-sa?amp=true</amplink>
      <pubDate>Mon, 16 Mar 2026 01:50:00 +0300</pubDate>
      <author>Vladislav Hîncu</author>
      <enclosure url="https://static.tildacdn.com/tild3564-6165-4764-b163-633763316662/When_AI_Says_Migrate.jpg" type="image/jpeg"/>
      <description>Vladislav Hîncu explores why AI-generated architecture recommendations must be filtered through business context, hard constraints, and operational realities — and why human judgment remains essential to making them truly valuable.</description>
      <turbo:content><![CDATA[<header><h1>When AI Says “Migrate” but the Business Says “Wait”: Value-Based Architecture in the Age of GenAI</h1></header><figure><img alt="" src="https://static.tildacdn.com/tild3564-6165-4764-b163-633763316662/When_AI_Says_Migrate.jpg"/></figure><div class="t-redactor__text">You can watch this talk on YouTube in <a href="https://www.youtube.com/watch?v=SmSiL4LUo8w">English</a>.<br />The presentation is available <a href="https://docs.google.com/presentation/d/1Pcthglj3N0R-AD9oYU64JMC3qVXvytWS/edit?usp=sharing&amp;rtpof=true&amp;sd=true">here</a>.<br /><br /><br />Vladislav Hîncu’s session tackled a very current architectural dilemma: AI tools can now analyze large codebases, suggest modernization strategies, estimate migration plans, and produce recommendations much faster than traditional review processes — but that speed does not mean those recommendations are automatically right for the business.<br /><br />The talk began with a balanced assessment of what GenAI can already do well in architecture and modernization work. It can parse large systems, identify patterns, classify architecture styles, suggest migrations, estimate technical effort, and compare certain trade-offs quickly. In other words, AI has become a powerful technical advisor. But the central message of the talk was that these tools still cannot see what matters most in many enterprise decisions: the business model, operational realities, regulatory obligations, local constraints, political dynamics, or the hidden assumptions that make one architecture viable and another dangerous.<br /><br />To illustrate that gap, Vladislav shared a detailed retail example. An AI architecture tool analyzed a legacy point-of-sale system and recommended a full cloud migration with a clean microservices-based architecture. On paper, the recommendation looked excellent: better performance, lower infrastructure costs, and a shorter implementation timeline. But one crucial question changed everything: what happens when a store loses internet connectivity? For a global retailer operating across many countries and conditions, stores needed to continue functioning offline. A cloud-only architecture would have created a direct business risk — stores would become unable to process transactions during outages. The talk made the point vividly: a technically impressive migration recommendation can still be catastrophic if it ignores a non-negotiable business invariant.<br /><br />From there, the session introduced the distinction between ordinary trade-offs and **hard business constraints**. AI tends to frame architecture decisions as optimizations between cost, performance, scalability, and reliability. But not everything is negotiable. Some requirements are hard boundaries: data must stay in a specific region, stores must function offline, a payroll integration cannot simply be replaced, or a latency threshold is tied to the business model itself. These are not just technical preferences. They define whether the business can continue operating.<br /><br />Another major contribution of the talk was Vladislav’s explanation of **true value**. AI-generated architecture recommendations often highlight expected benefits, but they do not fully account for persistent operational costs, increased complexity, new staffing needs, debugging overhead, distributed failure modes, training burden, or the business cost of losing useful existing capabilities. A migration may look elegant in a slide deck and still become more expensive and more fragile over time. He gave the example of systems broken into dozens of microservices where initial progress looked impressive, but long-term ownership costs and operational complexity eventually outweighed the gains.<br /><br />To help make these decisions more concrete, the talk proposed a practical framework:<br /><br />1. identify business invariants before reviewing AI recommendations,<br /><br />2. check every recommendation against those invariants,<br /><br />3. calculate full value rather than accepting surface-level benefit estimates.<br /><br />That framework formed the core of the talk’s argument: AI should be treated as an input into decision-making, not as the final decision-maker.<br /><br />The audience questions reinforced that theme. One attendee asked how to convince stakeholders that waiting can actually be the riskier option. Vladislav’s answer emphasized documentation and explicit risk framing: architects need to clearly record the constraints, risks, and possible outcomes so that business leaders understand what they are choosing and can own the trade-offs consciously. Another question asked whether AI would eventually replace platform or DevOps engineers. His response was nuanced: AI will likely replace or automate work that follows clear, repeatable patterns, but it will not replace roles centered on judgment and responsibility. Where decisions have business consequences, human accountability remains essential. In response to a final question about the main takeaway from the session, Vladislav summarized the talk as a call for collaboration rather than competition: the architect’s role is evolving from pure technical design into a bridge between AI-generated technical options and business reality.<br /><br />Overall, this was one of the strongest strategy talks of the conference. It did not reject AI — quite the opposite. It argued that AI is already powerful and useful, but that its recommendations become truly valuable only when filtered through business context, operational constraints, and accountable human judgment.</div>]]></turbo:content>
    </item>
    <item turbo="true">
      <title>Optimizing Workflows at Scale</title>
      <link>http://codepointconf.com/tpost/dgdh3ktad1-optimizing-workflows-at-scale</link>
      <amplink>http://codepointconf.com/tpost/dgdh3ktad1-optimizing-workflows-at-scale?amp=true</amplink>
      <pubDate>Sun, 15 Mar 2026 05:01:00 +0300</pubDate>
      <author>Nikolai Sidiropulo</author>
      <enclosure url="https://static.tildacdn.com/tild6261-3361-4037-a135-383561653937/Workflows.png" type="image/png"/>
      <description>Nikolai Sidiropulo makes the case for treating AI workflow performance as a business priority — covering instrumentation, tail latency, and observability practices that reveal hidden bottlenecks before they become operational problems.</description>
      <turbo:content><![CDATA[<header><h1>Optimizing Workflows at Scale</h1></header><figure><img alt="" src="https://static.tildacdn.com/tild6261-3361-4037-a135-383561653937/Workflows.png"/></figure><div class="t-redactor__text">You can watch this talk on YouTube in <a href="https://www.youtube.com/watch?v=J6qWuhS8PI8">Russian</a> and <a href="https://www.youtube.com/watch?v=SmSiL4LUo8w">English</a> (auto-dubbed).<br /><br /><br />Nikolai Sidiropulo’s talk focused on a topic that often gets less attention than model quality or token cost: <strong>performance at scale</strong>. His central point was that as AI systems move from experimentation into production, teams eventually reach a stage where optimization is no longer optional. Once there is real business value, real user volume, and real operational dependency on these workflows, latency and workflow performance become business issues rather than just technical ones.<br /><br />He framed optimization around three familiar software dimensions: <strong>reliability, efficiency, and performance</strong>. In AI systems, reliability is widely discussed because of non-determinism and the need for evals, prompt tuning, and output quality checks. Efficiency is also visible because token usage translates directly into dollars. But performance, he argued, is often under-discussed — even though in many real environments it determines whether the system is truly usable. A workflow that is technically correct but too slow can still become a bottleneck for the customer’s entire operation.<br /><br />To make that point concrete, Nikolai described a healthcare-oriented workflow where a nurse uses a system during patient discharge. In this kind of environment, the user is not casually multitasking while waiting for the AI to finish in the background. The process is linear, operational, and time-sensitive: the nurse stands there waiting for the system to complete its work before moving on. In such scenarios, even an extra minute of latency compounds quickly across many users and many cases. What looks like a small delay at the system level becomes a throughput problem at the organizational level.<br /><br />A major focus of the talk was <strong>instrumentation and observability</strong>. Nikolai emphasized that assumptions about performance are often wrong unless they are validated against actual production data. Local benchmarks and isolated tests rarely reflect the complexity of real-world usage. His recommendation was to instrument critical workflows, use tracing systems such as OpenTelemetry and platforms like Langfuse, and analyze data through percentiles rather than averages. In particular, he stressed the importance of tail latency — looking at P90, P95, or P99 behavior rather than just median response times — because the worst-performing slice of requests often reveals the most meaningful bottlenecks.<br /><br />One of the strongest practical insights from the session was the need to enrich traces with <strong>business-specific attributes</strong>, not just standard technical metadata. In the example he shared, adding contextual information such as state or region made it possible to identify asymmetric performance patterns. Some workflows were slower in certain locations because the agent was dynamically pulling extra regulatory context from external sources. Once that pattern became visible, the team could preload and cache the relevant information instead of forcing the agent to fetch it on demand. This improved latency, reduced token waste, and opened the door to better model choices at lower cost.<br /><br />During Q&amp;A, the audience asked for concrete examples of bottlenecks encountered while scaling workflows. Nikolai pointed back to exactly this kind of hidden asymmetry: issues that are not obvious in aggregate metrics but become clear once traces are grouped by meaningful business dimensions such as geography, time of day, or operational load. Another audience question asked what modern AI system architecture looks like in practice. In response, he outlined a progression from simpler systems to more complex ones: first, workflows that can still be solved algorithmically; then single-model-call systems; then more structured pipelines; and finally multi-agent and sub-agent architectures. His answer was careful and pragmatic — the more moving parts a system has, the more points of failure, non-determinism, and compounded risk it introduces.<br /><br />There was also an interesting discussion around whether DevOps can be seen as a foundation for AI systems. Nikolai’s answer was that many existing software and infrastructure practices are absolutely relevant, but AI introduces a different layer of complexity because behavior is probabilistic rather than strictly deterministic. Observability, cost tracking, and operational discipline still matter, but standard methods do not map perfectly onto AI workloads, particularly when it comes to evaluating quality, understanding latency distributions, and tracing non-deterministic execution paths.<br /><br />Overall, the talk offered a mature production-focused view of AI systems. Rather than treating optimization as a one-time tuning exercise, Nikolai described it as a recurring loop: instrument, observe, analyze, optimize, and repeat. For teams already operating AI workflows at meaningful scale, it was a strong reminder that the biggest opportunities often appear not in average performance, but in the edge cases and hidden clusters where the business feels the pain first.<br /><br />Presenter's email: nasidiropulo@gmail.com</div>]]></turbo:content>
    </item>
  </channel>
</rss>
