Your Privacy

We care about your privacy and rights.

Please refer to the Open Data Institute (ODI) Privacy policy for general terms including legal jurisdiction.

In addition to the above, the following outlines how interactions with an AI Assistant are managed and your rights. It is important to read and understand this so you are clear on the complete flow of data necessary to make the AI Assistants work.

Storage of conversations
We store all your conversations in our data storage location. This includes your queries, responses from the AI, and any ratings you provide. This information is tied to your user account (which is also stored in the same location). Conversations are encrypted when they are transferred (in transit), they are not encrypted when stored (at rest).
Why we store conversation history?
Our AI system needs to remember past conversations to provide you with relevant responses. Without this, the experience would not function as intended.
Deleting conversations
When you delete a conversation, it's permanently removed from our active system. However, it may still exist in system backups, which we keep to protect against system failures or cyber-attacks. Backups are retailed for the last for seven days, weekly for the month, and monthly for a year. This is standard practice in digital systems.
Assistant models and providers
Each assistant uses a general-purpose language model (LLM) provided by a third-party service. In order to answer a query, we have to send your query and conversation history to the general purpose model. This means that the general purpose model provider is able to see the query and conversation history. You can find the specific provider and model details on the assistant's information page, where you'll also find the link to the provider's privacy policy.
Ownership and rights
You retain ownership of the queries you submit. By using the system, you grant the Open Data Institute (ODI) a royalty-free, perpetual license to use and share your queries for purposes outlined in our Data use and sharing section. You can download copies of conversations through the copy button provided for each response.
The generated responses may be subject to copyright laws, and rights may be exclusive, shared, or non-existent depending on the nature of the content. Claims made within generated responses are only the responsibility of the creator insofar as those claims are present in the creators source content and only to the extent to which the creator is reponsible for the source content.
Data use and sharing
We do not sell or transfer your data to third parties for advertising or profit. As a research institute, we may share data with research partners. You will always be informed when this happens. The amount of data shared will be proporitional to the research question and is never retained. We do not permit onward sharing of this data. Unfortunately, we do not offer an opt-out from this sharing; if you disagree, we recommend not using our service.
Data storage location
All data is stored securely in MongoDB Cloud within the EU. We ensure your data is kept safe following regional standards and regulations.
Service hosting
Our service is hosted on Digital Ocean's cloud infrastructure in the UK, and we use Cloudflare to provide resilience and protect against cyber threats.

We hope this policy helps clarify how we manage your data. If you have any concerns, please reach out before using the service.

Terms of Use

Please refer to the terms of use of the Open Data Institute.

About the AI Assistants: How they work

Our AI assistants use a Retrieval-Augmented Generation (RAG) approach to provide accurate and relevant answers to your queries. Here's a breakdown of how the system operates, whether it's your first query or a follow-up.

The First Query in a Conversation

Retrieving Relevant Content
When you make your initial query, the system searches our database for content that most closely matches your question and is likely to provide the best answer.
Generating a Response
We then send your query and the retrieved content to our language model (LLM). The LLM uses this information to generate a response that addresses your query.
Displaying the Response and Sources
You receive the response along with a list of all sources used. These sources show where the content in step (1) came from, ensuring transparency.
- Guardrails: We instruct the LLM to only respond based on the supplied content, avoiding the generation of answers that aren't supported by the data.
- Limitations: While the LLM uses the content provided to formulate an answer, we don't have visibility into the exact phrases or sources it relied upon. Therefore, we list all relevant sources to cover the range of materials referenced.

Handling Follow-Up Queries

Processing the Follow-Up Query
When you ask a follow-up question, the system combines this new query with the conversation history and previously retrieved context. It then sends all this information to the LLM to determine if it can respond using the existing data.
Direct Response (If Possible)
If the LLM finds an answer based on the conversation history and context, we display it to you directly.
Query Resolution (If Needed)
If the LLM can't answer the query from existing information, it resolves the entities and context in your query to form a precise question. For example, if your initial query was "What was the main theme of the ODI Summit in 2019?" and the follow-up is "Who were the keynote speakers at the event?", the system resolves “the event” to “the ODI Summit in 2019” to understand what you're referring to.

This process also works if you introduce a completely new topic; the system then treats it like a new conversation.
Retrieving and Using New Content
Using the resolved query, the system retrieves new content from our database. This content, along with the conversation history, is sent to the LLM to generate a response.
Displaying the Updated Response
Finally, we show you the response, complete with the relevant sources used.

Our Approach to Query Resolution

The query resolution process is a custom feature we've developed to enhance the LLM's capabilities. It ensures that when references are vague or when topics shift, the system can still deliver accurate results by resolving entities and retrieving appropriate content.

This is how our RAG system operates, providing you with relevant, data-driven responses while ensuring transparency and reliability. If you have any questions about how the system works, feel free to reach out!

Open Data Institute

Open Data Institute, 4th Floor, Kings Place, 90 York Way, London N1 9AG

[email protected] · Company 08030289 · VAT 143 7796 80

Contents