23ai Vector Indexes….

What is a vector index? Well, it is a specialized data structure that is designed for similarity searches, where we have high-dimensional vector spaces. It helps accelerate vector similarity searches. And it uses techniques such as partitioning, clustering, and neighbor graphs, which allow us to group similar vectors together in order to reduce search space.

So why would you want to use a vector index? Well, similarity searches can be slow with exact similarity search. A vector index can enable fast similarity searches across large vector data sets. Without indexes, searching would require a comparison against every vector, and this drastically reduces the search space with the vector index. It improves query performance and enables efficient approximate nearest neighbor or ANN searches.

So again, we have two different types of indexes. Let’s take a look at the first one. HNSW or Hierarchical Navigable Small World. This is stored in memory, in the SGA area of memory. It is very efficient for vector similarity searches. It uses a layered hierarchical organization. And to help you imagine this, we have a graphic on the right that shows multiple layers. We have the entry layer with an entry point. We have a second layer, layer 1. And then we have layer 0, which is where we find the nearest neighbor.

So imagine this. Imagine that the highways or upper layers connect to major cities and local roads, or lower layers for detailed navigation. The HNSW index will start the search on the highways. In order to quickly get close to the target, it then uses the local roads for precise results. It is the best choice when data fits in memory.

Now, let’s take a look at the second type of index called the “IVF” index or Inverted File Flat. This is a partition-based approach. It balances search quality with speed, and it is better suited for larger data sets. As an example, we tell you to think of a library with books organized by subject. Each subject would be a partition. If you’re looking for a specific topic, you only need to search that relevant section or that partition, not the entire library. It supports DML operations.

Now, let’s take a look at the second type of index called the “IVF” index or Inverted File Flat. This is a partition-based approach. It balances search quality with speed, and it is better suited for larger data sets. As an example, we tell you to think of a library with books organized by subject. Each subject would be a partition. If you’re looking for a specific topic, you only need to search that relevant section or that partition, not the entire library. It supports DML operations.

Now, just to show you in the vector pool where the vector pool is stored, I wanted to show you this graphic. The vector pool is the memory area that is designed for the HNSW vector indexes and associated metadata. This is the memory area inside of the System Global Area. It also speeds up operations related to Inverted File Flat indexes. It is configured using VECTOR_MEMORY_SIZE parameter and can be modified either at the CDB or Container Database, or at the PDB level, or the Pluggable Database. And at the bottom of the screen the ALTER SYSTEM syntax in order to set the vector_memory_size parameter.

Hope this will help in understanding the vector indexes. In this next post, we shall see some examples of creating the same.