Importing Vector Embedding Models….

One way of creating vector embeddings could be to use someone’s domain expertise to quantify a predefined set of features or dimensions such as shape, texture, color, sentiment, and many others, depending on the object type with which you’re dealing. However, the efficiency of this method depends on the use case and is not always cost effective.

Instead, vector embeddings are created via neural networks. Most modern vector embeddings use a transformer model, as illustrated in the diagram above, but convolutional neural networks can also be used.

Depending on the type of your data, you can use different pretrained, open-source models to create vector embeddings. For example:

•For textual data, sentence transformers transform words, sentences, or paragraphs into vector embeddings.

•For visual data, you can use Residual Network (ResNet) to generate vector embeddings.

•For audio data, you can use the visual spectrogram representation of the audio data to fall back into the visual data case.

Each model also determines the number of dimensions for your vectors. For example:

•Cohere’s embedding model embed-english-v3.0 has 1024 dimensions.

•OpenAI’s embedding model text-embedding-3-large has 3072 dimensions.

•Hugging Face’s embedding model all-MiniLM-L6-v2 has 384 dimensions

•Of course, you can always create your own model that is trained with your own data set.

Although you can generate vector embeddings outside the Oracle Database using pretrained open-source embeddings models or your own embeddings models, you also have the option to import those models directly into the Oracle Database if they are compatible with the Open Neural Network Exchange (ONNX) standard. Oracle Database implements an ONNX runtime directly within the database. This allows you to generate vector embeddings directly within the Oracle Database using SQL.