Memory in an AI agent refers to the ability to retain and retrieve past interactions. There are two types of memory used in this agent:
Short-Term Memory (Thread Memory) – Stored in LangChainGo's ConversationBuffer, allowing the agent to recall recent messages within an active conversation.
Long-Term Memory (Vector Store) – Stored in Weaviate, allowing the agent to retrieve past interactions across sessions using semantic search.
Embeddings
Embeddings are numerical representations of text data, enabling AI to understand semantic similarity.
How This Works:
When a user sends a message, it is converted into an embedding vector.
The agent searches for similar past messages in Weaviate to provide context-aware responses.
This allows the AI to recall relevant information even if exact words don't match.
Thread Memory and Weaviate
Thread Memory
A thread represents an ongoing conversation session. Each user (or organization) can have multiple threads, enabling:
Context Retention – AI remembers the conversation within a thread.
Multi-Tenancy – Separate memory per user & organization.
Parallel Conversations – Users can run multiple independent chats.
Each thread has a unique ID, which is used to store messages, context, and embeddings in Weaviate.
Why Use Weaviate for Embeddings?
Efficient vector search for retrieving past interactions or documents.
Multi-tenant support for storing embeddings per user & organization.
Streaming and Agent Capabilities
Streaming Architecture
Streaming enables real-time AI responses, rather than waiting for the entire message to be generated before sending it.
The AI starts generating a response
Response is sent in small chunks
Client displays each chunk as it arrives
Benefits of Streaming
Faster response time for better user experience
Improves interactivity, making UX feel more real-time
More efficient use of compute resources compared to batch processing
Agent Capabilities
Maintain Conversational Memory per Thread
Maintain Multi-Tenancy with org_id and user_id Segmentation
Store and Retrieve Relevant Documents from Weaviate
Use OpenAI's LLM for Response Generation
Stream Responses to Clients in Real Time
Provide a Structured API
Key Components of the Agent
1
Agent Manager
Orchestrates interactions between OpenAI LLM, Weaviate Vector Store, and Memory System. Manages thread-based memory allocation, streaming response handling, and multi-tenant query execution.
2
Memory System
Uses two layers of memory: Short-Term (LangChainGo's ConversationBuffer) and Long-Term (Weaviate vector store). Ensures efficient multi-turn conversations and persistent knowledge retention across sessions.
3
API Layer
Provides RESTful endpoints for querying the AI agent, retrieving memory, adding documents to memory, and importing datasets into Weaviate.
4
Retrieval-Augmented Generation (RAG)
Enhances response quality by retrieving similar documents from Weaviate, providing external knowledge for accurate AI-generated answers, and improving context-awareness with past user inputs.
5
Middleware & Logging
Handles logging all API requests, monitoring memory and vector store operations, and providing request metadata for debugging.
These core components work together to create a robust, scalable AI agent infrastructure - from the foundational RAG system through memory management up to the API interface.
Adding documents to memory (/v1/agent/memory/update)
Importing datasets into Weaviate (/v1/agent/memory/import/:org_id/:user_id)
Middleware and Logging
Middleware & Logging
Handles:
Logging all API requests (Zerolog).
Monitoring memory and vector store operations.
Providing request metadata for debugging.
Ensures:
Visibility into agent interactions.
Easier debugging of API calls.
Structured logs for observability.
Next Steps
Now that you understand the concepts and architecture driving our agent, the final part will guide you step-by-step through the implementation details—from project setup and code organization to testing the fully functional system.