适用于 SQL 思维的 ChromaDB

PHP中文网 • 2025年2月25日 12:50:31 • 编程技术 • 阅读 2

您好，chroma db 是一个矢量数据库，对于使用 genai 应用程序非常有用。在本文中，我将通过查看 mysql 中的类似关系来探索如何在 chroma db 上运行查询。

模式

与 sql 不同，您无法定义自己的架构。在 chroma 中，您会获得固定的列，每个列都有自己的用途：

import chromadb#setiing up the clientclient = chromadb.client() collection = client.create_collection(name="name")collection.add(    documents = ["str1","str2","str3",...]    ids = [1,2,3,....]    metadatas=[{"chapter": "3", "verse": "16"},{"chapter":"3", "verse":"5"}, ..]               embeddings = [[1,2,3], [3,4,5], [5,6,7]])

登录后复制

ids：它们是唯一的 id。请注意，您需要自己提供它们，与 sql 不同，没有自动增量
文档： 用于插入用于生成嵌入的文本数据。您可以提供文本，它会自动创建嵌入。或者您可以直接提供嵌入并将文本存储在其他位置。
嵌入： 在我看来，它们是数据库中最重要的部分，因为它们用于执行相似性搜索。
元数据：这用于关联您可能想要添加到数据库中以获得任何额外上下文的任何其他数据。

现在集合的基础知识已经清楚了，让我们继续进行 crud 操作，我们将了解如何查询数据库。

增删改查操作

注意：集合就像 chroma 中的表格

要创建集合，我们可以使用 create_collection() 并根据需要执行我们的操作，但如果集合已经创建并且我们需要再次引用它，我们必须使用 get_collection() 否则我们会收到错误。

create table tablename

登录后复制

#create a collectioncollection = client.create_collection(name="name")#if a collection is already made and you need to use it again the usecollection = client.get_collection(name="name")

登录后复制

insert into tablenamevalues(... , ..., ...)

登录后复制

collection.add(    ids = [1]    documents = ["some text"]    metadatas = [{"key":"value"}]    embeddings = [[1,2,3]])

登录后复制

要更新插入的数据或删除数据，我们可以使用以下命令

collection.update(    ids = [2]    documents = ["some text"]    metadatas = [{"key":"value"}]    embeddings = [[1,2,3]]            )# if the id does not exist update will do nothing. to add data if id does not exist usecollection.upsert(    ids = [2]    documents = ["some text"]    metadatas = [{"key":"value"}]    embeddings = [[1,2,3]]            )# to delete data use delete and refrence the document or id or the feildcollection.delete(    documents = ["some text"]         )# or you can delete from a bunch of ids using where that will apply filter on metadatacollection.delete(    ids=["id1", "id2", "id3",...],    where={"chapter": "20"})

登录后复制

查询

现在我们将看看某些查询的样子

select * from tablenameselect * from tablename limit valueselect documents, metadata from tablename

登录后复制

collection.get()collection.get(limit = val)collection.get(include = ["documents","metadata"])

登录后复制

虽然 get() 用于获取大量表以进行更高级的查询，但您需要使用查询方法

select a,b from tablelimit val

登录后复制

collection.query(    n_results = val #limit    includes = [a,b] )

登录后复制

现在我们有3种可能的方法来过滤数据：相似性搜索（矢量数据库主要用于什么），元数据过滤器和文档过滤器

相似性搜索

我们可以根据文本或嵌入进行搜索并获得最相似的输出

collection.query(query_texts=["string"])collection.query(query_embeddings=[[1,2,3]])

登录后复制

在 chromadb 中，where 和 where_document 参数用于在查询期间过滤结果。这些过滤器允许您根据元数据或特定文档内容优化相似性搜索。

按元数据过滤

where 参数可让您根据关联的元数据过滤文档。元数据通常是您在文档插入期间提供的键值对的字典。

按类别、作者或日期等元数据过滤文档。

# insert documents with metadatacollection.add(    documents=["document about ai", "another document on ai", "general science content"],    metadatas=[        {"category": "ai", "author": "john"},        {"category": "ai", "author": "doe"},        {"category": "science", "author": "alice"},    ],    ids=["doc1", "doc2", "doc3"])# query with a metadata filterresults = collection.query(    query_texts=["artificial intelligence"],    n_results=5,    where={"category": "ai"}  # only retrieve documents with category = "ai")# outputprint("filtered documents:", results['documents'])

登录后复制

# you can filter using multiple conditions:  where={"category": "ai", "author": "john"}# supports operators like `$gt`, `$lt`, `$in`, etc. for example:where={"date": {"$gt": "2024-01-01"}}

登录后复制

按文档内容过滤

where_document 参数允许直接根据文档内容进行过滤。

仅检索包含特定关键字的文档。

# insert documentscollection.add(    documents=["ai is transforming the world", "ai and machine learning", "physics is fascinating"],    ids=["doc1", "doc2", "doc3"])# query with a document content filterresults = collection.query(    query_texts=["what is ai?"],    n_results=5,    where_document={"$contains": "ai"}  # only retrieve documents containing "ai")# outputprint("filtered documents:", results['documents'])

登录后复制

要点：

使用 $contains、$startswith 或 $endswith 等运算符。$contains：匹配包含子字符串的文档。$startswith：匹配以子字符串开头的文档。$endswith：匹配以子字符串结尾的文档。

例如：

where_document={"$startswith": "ai"}

登录后复制

常见用例：

我们可以像这样组合所有三个过滤器：

在特定类别中搜索：

collection.query(query_texts=["machine learning"], n_results=5, where={"category": "ml"})

登录后复制

搜索包含特定术语的文档:

collection.query(query_texts=["physics"], n_results=5, where_document={"$contains": "gravity"})

登录后复制

组合元数据和文档内容过滤器:

collection.query(    query_texts=["AI"],     n_results=5,     where={"author": "John"},     where_document={"$startsWith": "AI"})

登录后复制

这些过滤器提高了相似性搜索的精度，使 chromadb 成为目标文档检索的强大工具。

结论

我写这篇文章是因为我觉得在尝试制作自己的程序时该文档还有很多不足之处，我希望这会有所帮助！

感谢您的阅读，如果您喜欢这篇文章，请点赞和分享。另外，如果您是软件架构新手并且想了解更多信息，我将开始一个基于小组的队列，我将亲自与您和一个小组一起工作，教您有关软件架构和设计原理的所有知识。如果您有兴趣，可以填写下面的表格。 https://forms.gle/suaxrzryvbnv8ucga

以上就是适用于 SQL 思维的 ChromaDB的详细内容，更多请关注【创想鸟】其它相关文章！

发布者：PHP中文网，转转请注明出处：https://www.chuangxiangniao.com/p/2176170.html

适用于 SQL 思维的 ChromaDB

模式

增删改查操作

查询

相似性搜索

按元数据过滤

按文档内容过滤

要点：

常见用例：

结论

关于作者

PHP中文网签约作者

发表回复

适用于 SQL 思维的 ChromaDB

模式

增删改查操作

查询

相似性搜索

按元数据过滤

按文档内容过滤

要点：

常见用例：

结论

关于作者

AD推荐 黄金广告位招租... 更多推荐

相关推荐

发表回复

分享到:

请登录

AD推荐黄金广告位招租... 更多推荐