what are embeddings
A vector representation created by a machine learning model that encapsulates meaning and context.
It preserves distance based information which means that it is trivial to find the similarity (distance) between two objects from their embeddings.
The earliest models only support text in a single language (mostly English), but multilingual and multimodal models are increasing.
Some examples from the benchmark Mteb Massive text embedding benchmarkare
- Qwen/Qwen3-Embedding-8B from the Qwen series by Alibaba
- microsoft/harrier-oss-v1-27b from Microsoft
- nvidia/llama-embed-nemotron-8b from Nvidia
- tencent/KaLM-Embedding-Gemma3-12B-2511 from Tencent
what to embed / the data
- Milli Archives Foundation
- this is soooo wonderful! I am so happy to have come across it finally
- 3 collections as of now
- might be interesting to volunteer with them and talk about wimb and personal archives etc
flow
-
generate embeddings
-
store in a vector database
-
query vector db
- highlighting text on screen
-
get a list of files as response
- use the topmost file
- use the list to show all the files and then choose randomly
-
when to switch media, timing information
- if there is access to beat information, use it to sync the switch timing
-
have bins which hold refs to specific media and switch between them