indexers

indexers

This module provides functionality for creating a vector index from a CSV (text) file. It defines the VectorStore class, which is used to model and create vector databases from CSV text files using a vectoriser object.

This class requires a Vectoriser object from the vectorisers submodule, to convert the CSV’s text data into vector embeddings which are then stored in the VectorStore objects.

Key Features: - Batch processing of input files to handle large datasets. - Support for CSV file format (additional formats may be added in future updates). - Integration with a custom embedder for generating vector embeddings. - Logging for tracking progress and handling errors during processing.

VectorStore Class: - The VectorStore class is initialized with a vectoriser object and a CSV knowledgebase. - Additional columns in the CSV may be specified as metadata to be included in the vector database. - Upon creation, the VectorStore is saved in parquet format for efficient, and quick reloading via the VectorStore’s .from_filespace() method. - A new piece of text data (or label) can be queried against the VectorStore in the following ways: - .search(): to find the most semantically similar pieces of text in the vector database. - .reverse_search(): to find all examples in the knowledgebase that have a given label. - .embed(): to generate a vector embedding for a given piece of text data. - ‘Hook’ methods may be specified to perform pre-processing on input data before embedding, and post-processing on the output of the search methods.