AI Integration7 min read

Your Java AI Agent Isn't Dumb. Your Context Is.

57% of enterprise AI agents have quality problems. Most teams switch models. The actual fix is usually one of these 5 context engineering mistakes — and I've made most of them myself.

Kiryl RusanauWritten by Kiryl Rusanau

Your Java AI agent isn't dumb. Your context is.

I built my first LangChain4j agent thinking the hard part was picking the right model. Spoiler: the model was fine from day one. What kept breaking was everything the model didn't see, or saw wrong, or saw at the wrong moment.

Turns out this is common. LangChain's State of Agent Engineering report (2025), surveying 1,340 professionals: 57% of organizations have AI agents in production. One in three cite quality as their top barrier. Most of those failures trace back not to which model you picked, but to how you structured what the model sees.

There's a name for this: Context Engineering.

  • Prompt Engineering is how you ask the model.
  • Context Engineering is what the model sees when you ask — what data, from where, in what format, at what moment.

For Java developers, this shows up in 5 recurring mistakes. I've made most of them myself.


Mistake 1: The giant system prompt

You know the pattern. System prompt grows over time. Team keeps adding requirements. Edge cases accumulate. Six months later you have a 4,000-token document that reads like a terms of service nobody asked for.

The model technically "sees" everything. But seeing 4,000 tokens of static context is not the same as processing relevant information at the right moment.

The fix is just-in-time loading. Don't predict what the agent needs. Give it ways to retrieve what it needs, when it needs it.

// Don't — static blob loaded upfront
String systemPrompt = loadEntireKnowledgeBase(); // 5000 tokens

// Do — retrieve dynamically at query time
RetrievalAugmentor augmentor = DefaultRetrievalAugmentor.builder()
    .contentRetriever(EmbeddingStoreContentRetriever.builder()
        .embeddingStore(vectorStore)
        .maxResults(5)
        .minScore(0.75)  // this line matters — see Mistake 2
        .build())
    .build();

Practitioners building production agents generally report that roughly 20% of the final context is static instructions, while 80% arrives dynamically based on what the user actually asked. Static prompts don't scale — retrieval does.


Mistake 2: Missing minScore

This one is subtle and causes confident hallucinations.

Without minScore, your retriever returns the top-k results by similarity — even if none of them are actually relevant. Your agent gets slightly-wrong context, reasons from it confidently, and produces a wrong answer that sounds right.

In a banking environment this is not acceptable. A retriever returning UK compliance rules when the client asked about Polish regulations is a real problem.

EmbeddingStoreContentRetriever.builder()
    .embeddingStore(vectorStore)
    .maxResults(5)
    .minScore(0.75)   // nothing rather than irrelevant garbage
    .build()

0.75 is not a magic number. Tune it for your domain. But starting without it is a guaranteed path to context pollution.


Mistake 3: Shared ChatMemory across users

I see this in almost every first Java AI implementation.

// This is a data leakage bug
ChatMemory sharedMemory = MessageWindowChatMemory.withMaxMessages(20);

@AiService
interface SupportAgent {
    String chat(String question); // all users share one memory
}

User A's session bleeds into User B's context. At best, irrelevant responses. At worst, user data leaking between sessions.

The fix is one annotation:

@AiService
interface SupportAgent {
    String chat(@MemoryId String userId, @UserMessage String question);
}

LangChain4j then manages a separate ChatMemory instance per unique userId. Spring AI handles this with explicit conversation IDs passed to ChatMemory. Different API, same principle.

In any regulated industry, per-user memory scoping is not a nice-to-have. It's a compliance requirement.


Mistake 4: Treating @Tool descriptions as Javadoc

The tool description is not documentation for humans. It is the context the model uses to decide which tool to call.

// The model will make poor choices with this
@Tool("Get customer data")
public Customer getCustomer(String id) { ... }

// Better
@Tool("Retrieve customer account details: balance, status, last 5 " +
      "transactions. Use when the user asks about their account or " +
      "recent activity. Returns null if customer ID not found.")
public Customer getCustomer(String id) { ... }

I rewrote every tool description in my agent after debugging a case where it consistently picked the wrong retrieval method. The model wasn't confused. It lacked context about when to use what.

Write tool descriptions like you're briefing a new team member who needs to know exactly when to call this function — and when not to.


Mistake 5: RAG and conversation history are not the same thing

They serve different purposes. Mixing them creates a mess.

  • Semantic memory (your RAG / vector store): the agent's library. Documentation, policies, domain knowledge. Content that doesn't change per user.
  • Episodic memory (conversation history): the agent's journal. What this specific user said, what decisions were made, the context of this session.

Querying conversation history through vector similarity is wrong. You need chronological recall, not semantic scoring. Loading all user history into the vector store pollutes domain knowledge with personal data.

// Two systems, two jobs
ContentRetriever documentRetriever =
    EmbeddingStoreContentRetriever.from(vectorStore); // semantic

ChatMemory sessionMemory = MessageWindowChatMemory.builder()
    .maxMessages(20)
    .id(userId)
    .build(); // episodic, scoped per user

RetrievalAugmentor augmentor = DefaultRetrievalAugmentor.builder()
    .contentRetriever(documentRetriever) // semantic only
    .build();

Spring AI maps to the same separation: QuestionAnswerAdvisor for semantic retrieval, MessageChatMemoryAdvisor for conversation history. Same two-layer design, different API.


The reframe

If you've been building with LangChain4j or Spring AI, you've been doing Context Engineering already. You just didn't call it that.

  • RAG pipeline → semantic memory design
  • ChatMemory → episodic memory management
  • Tool calling → context for agent decision-making

The difference between an agent that works and one that doesn't is rarely the model. It's whether the model sees the right information, at the right time, in the right format.

57% of enterprise agents have quality problems. Most teams respond by switching models. The actual fix — in my experience — is usually one of the five things above.


What's the hardest context engineering problem you've run into building Java agents? I'm especially curious whether the tooling gap vs Python is actually a problem in practice, or whether LangChain4j is good enough for most production use cases.

Kiryl Rusanau

Kiryl Rusanau

I help businesses running on Java adopt AI without rewriting what already works. 7+ years building enterprise Java systems in FinTech — I add AI capabilities that are safe, measurable, and maintainable.