Reputation: 1
I have some large parquet files of data in Iceberg
(which I have stored using Spark
). My objective now is to pull these down using Spark
, convert them into a spark dataframe
, perform vector embedding to transform the dataframe into a new dataframe with the embedded vector columns, and then store this vector-column into a vector database like qdrant
.
I have had problems making things work so far, and online documentation on this specific topic is limited. I tried Spark NLP
, but it appears incompatible with the qdrant-spart connector
I used to allow qdrant to be a target for Spark. So I guess I am looking for what the conventional way is to do the following two:
I feel like the distributed nature of Spark is a big obstacle here.
Upvotes: 0
Views: 417