RAG in N8N
Goal
In this post, I wanted to share my process of building a simple Retrieval-Augmented Generation (RAG) model usin n8n. My main goals were:
- Build a working RAG pipeline from scratch
- Create a basic evaluation framework
- Explore ideas for how I could improve it later
Understanding RAG
Here’s how I think about RAG.
Say I have a bunch of documents (in my case, mock insurance reports), and I want to build a chatbot that can answer questions using those documents. If I just send a question like:
What claims did Aisha Brown submit?
… to ChatGPT or another LLM, it’ll most unlikely fail unless I manually copy-paste the context.
But with RAG, I can automatically retrieve the right piece of text that match the user’s question, and then pass that along with the question to the LLM. Thats what makes the answer accurate.
The secret sauce behind RAG:
- Break up documents into chunks
- Embed those chunks using a model that turns text into vectors
- Store them in a vector database
- When a user asks a question, embed that too
- Do a vector similarity search to find relevant chunks
- Pass those chunks + the question to the LLM
It’s a simple idea, but it works really well.
What I needed to Build
- Chunking - How do I split the docs without breaking useful context?
- Embedding - I used Google Gemini to turn text into vectors.
- Vector store - Pinecone handles my vector database.
- LLM query - Once I have chunks, I send them + the prompt to the LLM.
My Stack
Here’s what I used:
- Python script for chunking and data prep
- Google Gemini for embeddings
- Pinecone as the vector database
- Google Sheets as a simple document store
- n8n to orchestrate everything
Generating Mock Insurance Claims
TO start, I create a Python script to generate fake insurance claim data. This way I could test everything end-to-end.
First a python script is used to generate random paramters to create our prompt.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import random
from datetime import datetime, timedelta
incident_types = [
"car accident", "flood damage", "burglary", "fire loss",
"personal injury", "storm damage", "theft from vehicle", "water leak", "hail damage"
]
locations = [
"Perth, WA", "Brisbane, QLD", "rural NSW", "Sydney CBD", "Gold Coast",
"Melbourne, VIC", "Adelaide, SA", "Darwin, NT", "Hobart, TAS", "Canberra, ACT"
]
first_names = [
"John", "Tina", "Mohammed", "Eliza", "Tommy", "Natalie", "Raj", "Sophie", "Jacob", "Linda",
"Grace", "Lucas", "Aisha", "Dylan", "Mei", "Ethan", "Amelia", "Noah", "Liam", "Zara"
]
last_names = [
"Barnes", "Ho", "Faisal", "Chen", "Singh", "Brooks", "Patel", "Nguyen", "Wallace", "Park",
"Smith", "Brown", "Taylor", "Williams", "Jones", "Davis", "Lee", "Martin", "Clark", "Evans"
]
def generate_random_name():
return f"{random.choice(first_names)} {random.choice(last_names)}"
def random_date_within_last_year():
days_ago = random.randint(0, 365)
date = datetime.now() - timedelta(days=days_ago)
return date.strftime("%Y-%m-%d")
for item in _input.all():
item.json.incident_type = random.choice(incident_types)
item.json.location = random.choice(locations)
item.json.claimant_name = generate_random_name()
item.json.incident_date = random_date_within_last_year()
return _input.all()
Then I used those random values in a prompt:
1
2
3
4
5
6
7
8
9
10
Generate an unstructured insurance claim report from the perspective of an insurance field agent. The report should include:
- The type of incident: [[ $json.incident_type ]]
- The data of incident: [[ $json.incident_date ]]
- The location: [[ $json.location ]]
- The claimant's name: [[ $json.claimant_name ]]
- A brief narrative description of what happened
- Any suspicious elements or inconsistencies
- Recommended next steps
The tone should be professional and observational, not emotional. It should be in markdown format and be no longer than 1 page
THe prompt was fed to the LLM and saved to Google Sheets. Below is a sample of the generated document.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## Field Report - Burglary Claim
**Claimant:** Aisha Brown
**Date of Incident:** 2024-11-03
**Address of Incident:** Perth, WA
**Type of Incident:** Burglary
**Initial Narrative:**
Upon arrival at the property at approximately 10:00 AM local time on 2024-11-04, I conducted an on-site inspection and interviewed Ms. Brown. Ms. Brown reports that she left the premises at approximately 8:00 AM on 2024-11-03 and returned at approximately 6:00 PM to discover signs of forced entry. The back door appeared to have been kicked in, with visible splintering around the doorjamb. Ms. Brown reported the following items missing: a 55-inch television, a laptop computer (make and model: Dell XPS 13), and a jewelry box containing assorted costume jewelry. A police report has been filed (report number provided by claimant: 20241103-WA-PD-789).
**Observed Inconsistencies and Suspicious Elements:**
* **Lack of Significant Disruption:** While the back door showed signs of forced entry, the damage appeared to be relatively minimal, particularly given the reported value of the stolen items. The doorjamb was splintered, but the door itself was still relatively secure within the frame. Further investigation into the force required to cause the observed damage is recommended.
* **Specificity of Stolen Items:** Ms. Brown was able to provide a precise make and model number for the laptop computer. However, when questioned about the specific contents of the jewelry box, she was less certain, stating the contents were "assorted costume jewelry" but unable to provide a detailed list or estimated value. This discrepancy warrants further investigation of ownership and value.
* **Neighborhood Security:** While speaking to a neighbor (Mr. David Lee, address adjacent to Ms. Brown’s), I learned that several other burglaries have been reported in the area in the past month. Mr. Lee indicated that none of the previous incidents involved forced entry; they were all attributed to unlocked doors or windows. This raises the possibility that Ms. Brown may have also left the back door unlocked, although she vehemently denies doing so.
**Recommended Next Steps:**
1. **Forensic Evaluation of Door:** Engage a forensic expert to examine the back door and doorjamb to determine the likely amount of force required to cause the observed damage. This will help determine if the damage is consistent with forced entry or a staged event.
2. **Verification of Ownership and Value:** Request documentation (receipts, appraisals, purchase records) to verify Ms. Brown's ownership and the value of the claimed items, particularly the Dell XPS 13 laptop and the contents of the jewelry box. A detailed appraisal of the jewelry may be necessary.
3. **Police Report Follow-Up:** Contact the Perth Police Department to obtain a copy of the police report and inquire about the progress of their investigation. Determine if any suspects have been identified or if there are similarities to other recent burglaries in the area.
4. **Neighborhood Canvassing (Secondary):** Conduct further interviews with neighbors beyond Mr. Lee to gather additional information about suspicious activity in the area and to ascertain if anyone witnessed anything relevant to the reported burglary.
5. **Background Check:** A standard background check on Ms. Brown is recommended to assess any prior claims history or potential red flags.
End of Report
Chunking and Indexing the Documents
Here’s a screenshot of the n8n pipline I built:
The flow works like this:
- Fetch mocks reports from Google Sheets
- Chunk the documents
- Use Google Gemini to embed them
- Store the vectors in Pinecone
Before this work, I had to:
- Set up credentials for Gemini, Pinecone, and Google Sheets
- Create a Pinecone index (768 dimensions, consine similarity)
Here’s how a records looks inside Pinecone:
Chatbot Query Pipeline
Now the fun part - query the system.
Let’s say I enter this:
Can you give me a summary of the claims made by Aisha Brown? n8n does the following:
- Embeds the question using Gemini
- Queries Pinecone for similar vector
- Sends retrieved chunks + question to Gemini for an answer
Below is an example of the query + context sent to Gemini:
“System: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question.\nIf you don’t know the answer, just say that you don’t know, don’t try to make up an answer.\n—————-\nContext: ## Field Report: Theft from Vehicle - Aisha Brown\n\nAgent: Unit 78, Regional Coverage\n\nDate of Report: 2024-06-14\n\nClaimant: Aisha Brown\n\nClaim Number: (To Be Assigned)\n\nDate of Incident: 2024-06-12\n\nType of Incident: Theft from Vehicle\n\nLocation: Unnamed roadside pull-over, approximately 15km north of Narrabri, NSW (GPS coordinates attached – recorded at site). Described by claimant as a generally well-maintained but isolated lay-by area frequently used by truck drivers and touring vehicles.\n\nNarrative Description:\n\n## Field Report - Burglary Claim\n\nClaimant: Aisha Brown\nDate of Incident: 2024-11-03\nAddress of Incident: Perth, WA\n\nType of Incident: Burglary\n\nInitial Narrative:\n\nUpon arrival at the property at approximately 10:00 AM local time on 2024-11-04, I conducted an on-site inspection and interviewed Ms. Brown. Ms. Brown reports that she left the premises at approximately 8:00 AM on 2024-11-03 and returned at approximately 6:00 PM to discover signs of forced entry. The back door appeared to have been kicked in, with visible splintering around the doorjamb. Ms. Brown reported the following items missing: a 55-inch television, a laptop computer (make and model: Dell XPS 13), and a jewelry box containing assorted costume jewelry. A police report has been filed (report number provided by claimant: 20241103-WA-PD-789).\n\nObserved Inconsistencies and Suspicious Elements:\n\n4. Further Investigation of Claim History: Conduct a thorough investigation of Ms. Brown’s claim history to identify any patterns or prior claims related to weather damage. Clarify any discrepancies identified between previous claims and the current claim.\n\nConclusion:\n\nFurther investigation is required to validate the claimant’s account and determine the legitimacy of the claim. Discrepancies in the available information warrant a thorough and comprehensive investigation.\nHuman: can you give a summary of the claims made by Aisha Brown”
Here’s what came back:
{ “response”: “Aisha Brown made two claims:\n\n1. Theft from Vehicle: On 2024-06-12, Aisha Brown reported a theft from her vehicle at an unnamed roadside pull-over approximately 15km north of Narrabri, NSW.\n2. Burglary: On 2024-11-03, Aisha Brown reported a burglary at her property in Perth, WA. Missing items include a 55-inch television, a Dell XPS 13 laptop, and a jewelry box with costume jewelry.\n” }
Screenshot below shows the vector embedding generated from our query
Screnshot below shows the K closest vector to our query vector
Below are the two insurance claim documents submitted by Aisha Brown, we can use this to verify whether our answer is correct
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## Field Report - Burglary Claim
**Claimant:** Aisha Brown
**Date of Incident:** 2024-11-03
**Address of Incident:** Perth, WA
**Type of Incident:** Burglary
**Initial Narrative:**
Upon arrival at the property at approximately 10:00 AM local time on 2024-11-04, I conducted an on-site inspection and interviewed Ms. Brown. Ms. Brown reports that she left the premises at approximately 8:00 AM on 2024-11-03 and returned at approximately 6:00 PM to discover signs of forced entry. The back door appeared to have been kicked in, with visible splintering around the doorjamb. Ms. Brown reported the following items missing: a 55-inch television, a laptop computer (make and model: Dell XPS 13), and a jewelry box containing assorted costume jewelry. A police report has been filed (report number provided by claimant: 20241103-WA-PD-789).
**Observed Inconsistencies and Suspicious Elements:**
* **Lack of Significant Disruption:** While the back door showed signs of forced entry, the damage appeared to be relatively minimal, particularly given the reported value of the stolen items. The doorjamb was splintered, but the door itself was still relatively secure within the frame. Further investigation into the force required to cause the observed damage is recommended.
* **Specificity of Stolen Items:** Ms. Brown was able to provide a precise make and model number for the laptop computer. However, when questioned about the specific contents of the jewelry box, she was less certain, stating the contents were "assorted costume jewelry" but unable to provide a detailed list or estimated value. This discrepancy warrants further investigation of ownership and value.
* **Neighborhood Security:** While speaking to a neighbor (Mr. David Lee, address adjacent to Ms. Brown’s), I learned that several other burglaries have been reported in the area in the past month. Mr. Lee indicated that none of the previous incidents involved forced entry; they were all attributed to unlocked doors or windows. This raises the possibility that Ms. Brown may have also left the back door unlocked, although she vehemently denies doing so.
**Recommended Next Steps:**
1. **Forensic Evaluation of Door:** Engage a forensic expert to examine the back door and doorjamb to determine the likely amount of force required to cause the observed damage. This will help determine if the damage is consistent with forced entry or a staged event.
2. **Verification of Ownership and Value:** Request documentation (receipts, appraisals, purchase records) to verify Ms. Brown's ownership and the value of the claimed items, particularly the Dell XPS 13 laptop and the contents of the jewelry box. A detailed appraisal of the jewelry may be necessary.
3. **Police Report Follow-Up:** Contact the Perth Police Department to obtain a copy of the police report and inquire about the progress of their investigation. Determine if any suspects have been identified or if there are similarities to other recent burglaries in the area.
4. **Neighborhood Canvassing (Secondary):** Conduct further interviews with neighbors beyond Mr. Lee to gather additional information about suspicious activity in the area and to ascertain if anyone witnessed anything relevant to the reported burglary.
5. **Background Check:** A standard background check on Ms. Brown is recommended to assess any prior claims history or potential red flags.
End of Report
the second claim that Aisha made
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## Field Report: Theft from Vehicle - Aisha Brown
**Agent:** Unit 78, Regional Coverage
**Date of Report:** 2024-06-14
**Claimant:** Aisha Brown
**Claim Number:** (To Be Assigned)
**Date of Incident:** 2024-06-12
**Type of Incident:** Theft from Vehicle
**Location:** Unnamed roadside pull-over, approximately 15km north of Narrabri, NSW (GPS coordinates attached – recorded at site). Described by claimant as a generally well-maintained but isolated lay-by area frequently used by truck drivers and touring vehicles.
**Narrative Description:**
Claimant, Aisha Brown, reports parking her 2021 Toyota RAV4 (Registration ABC-123) at the aforementioned location at approximately 13:00 on 2024-06-12. Ms. Brown states she left the vehicle unattended for a period of approximately one hour to engage in hiking along a nearby track. Upon returning to the vehicle at approximately 14:00, she discovered the rear passenger side window had been shattered. Numerous items inside the vehicle were missing, including:
* One (1) Apple MacBook Pro (16 inch, 2021 model) - Serial Number provided separately.
* One (1) Louis Vuitton Monogram Neverfull GM handbag - No serial number available.
* Assorted travel documents (Passport, driver's license, etc.) – Claimant has provided copies of these documents separately.
* Approximately $300 AUD in cash.
Ms. Brown contacted local police (Narrabri Police Station) who attended the scene and took a statement. Incident report number is pending. Damage to the vehicle consists solely of the shattered rear passenger side window. No other points of entry appear to have been compromised. Claimant secured the vehicle with temporary patching provided by a passing motorist.
**Suspicious Elements/Inconsistencies:**
* **High-Value Items Left Visible:** The claimant stated the handbag containing the Macbook Pro was positioned on the rear passenger seat, clearly visible from the exterior of the vehicle. This seems atypical given the isolated location and the inherent risk of theft.
* **Lack of Other Signs of Disturbance:** The area immediately surrounding the vehicle showed no obvious signs of disturbance beyond the shattered glass. No footprints or discarded items were apparent, suggesting a potentially quick and efficient operation.
* **Specificity of Lost Items:** While the claimant provided the serial number for the MacBook Pro, she was unable to recall any identifying features of the handbag, further than its model and style. While not inherently suspicious, it warrants further scrutiny.
* **Limited Wi-Fi/Cell Service:** The claimant stated that she was unable to call or use location based services due to low or no Wi-Fi or cellular data coverage. This is potentially true based on geographical conditions, although testing signal strength at the specific coordinates is advised.
**Recommended Next Steps:**
* **Obtain Narrabri Police Incident Report:** Crucial to verify the claimant's statement and ascertain any findings of their investigation.
* **Contact Witness Contacted:** Claimant said she contacted a passing motorist to assist in securing the vehicle. Identify and interview this witness. Focus on verifying time frames.
* **Review Claimant's Social Media:** Check for any relevant posts or activity around the date of the incident, including any mentions of travel plans or recent acquisitions.
* **Verification of Ownership:** Confirm ownership of the specified items via purchase receipts or other relevant documentation.
* **Assess Comparable Claims:** Review historical claim data for similar theft incidents in the Narrabri region to identify any patterns or trends.
* **Consider Formal Interview:** Based on initial findings, a formal interview with the claimant may be necessary to address the identified inconsistencies and clarify details.
Evaluation pipeline
TODO




