Building AI Assistant Application in Java

In my previous article, I discussed how Helidon integrates with LangChain4J. While the article provided a solid foundation, some readers pointed out the lack of a complete, hands-on example. This time, we’ll fix that by building a fully functional, practical AI-powered Java application.

We’ll create the Helidon Assistant — a chatbot with a web UI trained to answer questions about the Helidon framework. By “trained,” I mean it will be capable of answering questions based on the full Helidon documentation.

In this article, we’ll explore how to preprocess AsciiDoc files for ingestion into an embedding store and how to make the application stateless using an AI-powered summarization mechanism. Let’s dive in!

The Project Overview

The Helidon Assistant is built with the following technologies:

  • Java 21
  • Helidon SE 4 as the runtime
  • LangChain4J for AI integration
  • In-memory Embedding Store for storing and retrieving document embeddings
  • OpenAI GPT-4o as the default chat model
  • Helidon SE static content feature for serving the web UI
  • Bulma CSS framework for clean and minimalistic UI styling

The application is organized into three main layers:

  • RESTful Service Layer – defines the public REST API and serves the web UI.
  • AI Services Layer – defines LangChain4J AI services: one for answering user questions, and another for summarizing the conversation.
  • RAG Layer – handles ingestion: reading AsciiDoc files, preprocessing content, creating embeddings, and storing them in the embedding store.

The architecture diagram is shown below:

Building and Running the Project

The project is available on GitHub here. You can either clone the repository or browse the sources directly on GitHub. I’ll refer to it throughout the article.

Before building and running the project, you need to configure the path to the AsciiDoc documentation the assistant will work with. If you already have the documentation locally—great! If not, you can clone the Helidon repository from GitHub:

git clone https://github.com/helidon-io/helidon.git

The documentation files are located in docs/src/main/asciidoc/mp.

Next, update the application configuration file located at src/main/resources/application.yaml with the path to your AsciiDoc files:

app:
  root: "//home/dmitry/github/helidon/docs/src/main/asciidoc/mp"
  inclusions: "*.adoc"

Make sure to adjust the root path to match your local environment. You can also use the inclusions and exclusions properties to filter which files under the root directory should be included during ingestion.

Now you’re ready to build the application:

mvn clean package

And launch it:

java -jar target/helidon-assistant.jar

Once running, open your browser and go to http://localhost:8080. You’ll be greeted with the assistant’s web interface, where you can start asking questions.

Here are a few example questions to get you started:

  • How to use metrics with Helidon?
  • What does the @Retry annotation do?
  • How can I configure a web server?
  • How can I connect to a database?

In the next section, we’ll take a closer look at the build script and project dependencies.

Dependencies

The project uses a standard Maven pom.xml configuration recommended for Helidon SE applications, with several additional dependencies specific to this use case. Below is a commented snippet explaining the purpose of each dependency:

<dependencies>
  <!-- Helidon integration with LangChain4J -->
  <dependency>
    <groupId>io.helidon.integrations.langchain4j</groupId>
    <artifactId>helidon-integrations-langchain4j</artifactId>
  </dependency>

  <!-- OpenAI provider: required for using the GPT-4o chat model.
       Replace this with another provider if using a different LLM. -->
  <dependency>
    <groupId>io.helidon.integrations.langchain4j.providers</groupId>
    <artifactId>helidon-integrations-langchain4j-providers-open-ai</artifactId>
  </dependency>

  <!-- LangChain4J embeddings model used for RAG functionality -->
  <dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
  </dependency>

  <!-- AsciidoctorJ: used to parse and process AsciiDoc documentation files -->
  <dependency>
    <groupId>org.asciidoctor</groupId>
    <artifactId>asciidoctorj</artifactId>
    <version>${version.lib.asciidoctorj}</version>
  </dependency>

  <!-- Various Helidon dependencies needed for the application proper
       functionality -->
  <dependency>
    <groupId>io.helidon.webserver</groupId>
    <artifactId>helidon-webserver</artifactId>
  </dependency>
  <dependency>
    <groupId>io.helidon.webserver</groupId>
    <artifactId>helidon-webserver-static-content</artifactId>
  </dependency>
  <dependency>
    <groupId>io.helidon.http.media</groupId>
    <artifactId>helidon-http-media-jsonp</artifactId>
  </dependency>
  <dependency>
    <groupId>io.helidon.config</groupId>
    <artifactId>helidon-config-yaml</artifactId>
  </dependency>

  <!-- Logging -->
  <dependency>
    <groupId>io.helidon.logging</groupId>
    <artifactId>helidon-logging-jul</artifactId>
    <scope>runtime</scope>
  </dependency>
  <dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-jdk14</artifactId>
    <scope>runtime</scope>
  </dependency>
</dependencies>

You can use these dependencies for other AI-powered projects with minimal changes.

Application main class

The ApplicationMain class serves as the application’s entry point. It contains the main method, which performs the following steps:

  1. Enables runtime logging.
  2. Loads the application configuration.
  3. Ingests documentation into the embedding store.
  4. Sets up web server routing to serve static web pages and handle user requests.
  5. Starts the web server.

Below is a snippet of the main method:

public static void main(String[] args) {
    // Make sure logging is enabled as the first thing
    LogConfig.configureRuntime();

    var config = Services.get(Config.class);

    // Initialize embedding store
    Services.get(DocsIngestor.class)
            .ingest();

    // Static content setup
    var staticContentFeature = StaticContentFeature.builder()
            .addClasspath(cl -> cl.location("WEB")
                    .context("/ui")
                    .welcome("index.html"))
            .build();

    // Initialize and start web server
    WebServerConfig.builder()
            .addFeature(staticContentFeature)
            .config(config.get("server"))
            .routing(ApplicationMain::routing)
            .build()
            .start();
}

In the next section, I’ll explain how the embedding store is created and initialized.

Preparing AsciiDoc Files for Embeddings

Although AsciiDoc is lightweight and human-readable, the source content isn’t immediately ready for use in AI-powered retrieval. AsciiDoc files often contain structural directives like include statements, developer comments, attribute substitutions, and variables intended for conditional rendering or reuse. These elements are meaningful for human readers or documentation generators but can confuse or mislead a language model if left unprocessed. Additionally, formatting artifacts and metadata can introduce noise. Without proper preprocessing, the resulting embeddings might be irrelevant or misleading, which degrades the quality and accuracy of the assistant’s responses.

To address this, we apply a structured preprocessing pipeline.

  • AsciiDocJ Integration: We use the official AsciiDocJ parser to fully parse AsciiDoc documents. This library resolves include directives automatically and gives us a structured representation of the content.
  • Section-Based Chunking: We group content elements by their surrounding section and generate one embedding per section. This preserves logical and thematic boundaries and helps ensure responses remain relevant.
  • Preserve Atomic Elements: We make sure that tables and code snippets are not split across chunks. This is critical to retain the contextual meaning of examples and structured content.
  • Attach Metadata: Each chunk is enriched with metadata such as document title, relative file path, and section index. This helps reconstruct the document context when presenting answers.
  • Repeat for Each File: This process is repeated for each .adoc file identified by the inclusion pattern in the configuration.

This preprocessing ensures that the AI retrieves precise, coherent documentation segments in response to user queries, resulting in more accurate and helpful answers.

Here’s how the implementation is structured:

  • AsciiDocPreprocessor.java parses the file and produces a list of document chunks.
  • ChunkGrouper.java groups chunks into section-based logical units.
  • FileLister.java reads the directory path and applies inclusion/exclusion patterns.
  • DocsIngestor.java orchestrates the overall process: listing files, extracting and grouping chunks, converting them to TextSegment objects, and storing the resulting embeddings.

A simplified snippet from DocsIngestor.java demonstrates the ingestion logic:

public void ingest() {
    var files = FileLister.listFiles(root, exclusions, inclusions);
    var processor = new AsciiDocPreprocessor();
    var grouper = new ChunkGrouper(1000);

    for (Path path : files) {
        var chunks = processor.extractChunks(path.toFile());
        var groupedChunks = grouper.groupChunks(chunks);

        List<TextSegment> segments = new ArrayList<>();
        for (int i = 0; i < groupedChunks.size(); i++) {
            var chunk = groupedChunks.get(i);
            var metadata = new Metadata()
                    .put("source", path.toFile().getAbsolutePath())
                    .put("chunk", String.valueOf(i + 1))
                    .put("type", chunk.type().name())
                    .put("section", chunk.sectionPath());

            segments.add(TextSegment.from(chunk.text(), metadata));
        }

        var embeddings = embeddingModel.embedAll(segments);
        embeddingStore.addAll(embeddings.content(), segments);
    }
}

Serving static web pages

The UI consists of a single index.html file located in the resources/WEB directory. It’s styled using the Bulma CSS framework, which is designed to be JavaScript free.

But there is a small piece of the JavaScript code anyway. It sends user messages to the backend when the Send button is clicked, updates the chat window with the response, and manages conversation summary state.

To serve this page, we register a StaticContentFeature during the web server startup. The code below demonstrated how it’s done in the main method.

// Static content setup
var staticContentFeature = StaticContentFeature.builder()
        .addClasspath(cl -> cl.location("WEB")
                .context("/ui")
                .welcome("index.html"))
        .build();

// Initialize and start web server
WebServerConfig.builder()
        .addFeature(staticContentFeature)
        ...

/ui path is registered to serve the static content. It user tries to open another path, he will be redirected to /ui. It’s done in the routing method.

static void routing(HttpRouting.Builder routing) {
    routing.any("/", (req, res) -> {
                // showing the capability to run on any path, and redirecting from root
                res.status(Status.MOVED_PERMANENTLY_301);
                res.headers().set(UI_REDIRECT);
                res.send();
            })
            .register("/chat", Services.get(ChatBotService.class));
}

Processing user requests

When the user clicks the Send button in the UI, a server call to the /chat endpoint is initiated. This request sends the user’s message along with a conversation summary to the server. We’ll discuss conversation summaries in a later section—let’s first focus on how the request is processed on the server side.

User requests to /chat are handled by the ChatBotService.java class. This class is registered during web server initialization, as shown in ApplicationMain.java. Below is a simplified snippet that demonstrates how it’s done:

static void routing(HttpRouting.Builder routing) {
    routing.register("/chat", Services.get(ChatBotService.class));
    ...
}

The ChatBotService class contains the chatWithAssistant method, which handles incoming requests. It performs the following steps:

  1. Extracts the user’s message and conversation summary from the request.
  2. Invokes ChatAiService, passing the message and summary to generate a response.
  3. Uses SummaryAiService to create an updated conversation summary.
  4. Builds a JSON object containing the response and the updated summary, and sends it back to the client.

Here’s the simplified code for the chatWithAssistant method:

private void chatWithAssistant(ServerRequest req, ServerResponse res) {
    var json = req.content().as(JsonObject.class);
    var message = json.getString("message");
    var summary = json.getString("summary");

    var answer = chatAiService.chat(message, summary);
    var updatedSummary = summaryAiService.chat(summary, message, answer);

    var returnObject = JSON.createObjectBuilder()
                .add("message", answer)
                .add("summary", updatedSummary)
                .build();
    res.send(returnObject);
}

The ChatAiService.java class is implemented as a Helidon AI service. You can learn more about AI services and how to implement them in my previous article.

Here’s the relevant code:

@Ai.Service
public interface ChatAiService {

    @SystemMessage("""
            You are Frank, a helpful Helidon expert.

            Only answer questions related to Helidon and its components. If a question is not relevant to Helidon, 
            politely decline.

            Use the following conversation summary to keep context and maintain continuity:
            {(summary})
            """)
    String chat(@UserMessage String question, @V("summary") String previousConversationSummary);
}

Making the Application Stateless

In a typical chat application, the backend must maintain the full history of the conversation in order to understand the user’s intent. This is because language models like OpenAI’s GPT rely heavily on context — they need to see the dialogue leading up to the current question to provide an accurate and helpful answer. The longer and more complex the conversation, the more memory is required to hold that context.

However, storing chat history introduces challenges. If you’re running a single backend instance, you might store this state in memory. But in a production environment, especially in cloud-native deployments, applications often scale horizontally — meaning multiple instances of the backend may be running behind a load balancer. In such setups, traditional in-memory storage for chat history doesn’t work: the next request from the same user might be routed to a different backend instance that has no access to prior state.

This is where statelessness becomes critical. Stateless services are inherently scalable, easier to maintain, and more resilient. But to make a chatbot stateless without sacrificing conversation quality, we need a way to preserve and compress context — and that’s where AI-powered summarization comes in.

By summarizing the chat history into a compact form after every message, we replace a long list of messages with a lightweight, synthetic memory that still captures the user’s intent and context. This summary is sent along with the next message, enabling consistent, relevant responses while allowing each request to be handled independently.

The Helidon Assistant uses this technique to remain stateless and cloud-native, ensuring it can scale easily while maintaining meaningful conversations with users.

Summarizer implemented as an AI service. You can read more about AI services and how to implement them in my previous article.

@Ai.Service
public interface SummaryAiService {

    @SystemMessage("""
        You are a conversation summarizer for an AI assistant. Your job is to keep a concise summary of the
        ongoing conversation to preserve context.
        Given the previous summary, the latest user message, and the AI's response, update the summary so it
        reflects the current state of the conversation.
        Keep it short, factual, and focused on what the user is doing or trying to achieve. Avoid rephrasing the
        entire response or repeating long parts verbatim.
        """)
    @UserMessage("""
        Previous Summary: 
        {{summary}}

        Last User Message:
        {{lastUserMessage}}

        Last AI Response:
        {{aiResponse}}
        """
    )
    String chat(@V("summary") String previousSummary,
                @V("lastUserMessage") String latestUserMessage,
                @V("aiResponse") String aiResponse);
}

Wrapping Up

That’s it — we’ve built a fully working, stateless AI assistant powered by Helidon and LangChain4J. Hopefully, everything is clear and nothing important was left out. But if something feels confusing or needs more explanation, I’d love to hear your thoughts. Feedback is always welcome — whether it’s a bug, a missing step, or just a better way to do things.

Want to dive into the code or try it yourself? You’ll find everything here:

GitHub: Helidon Assistant

Thanks for reading — and happy coding!

Leave a comment