Reputation: 8461
I have SparseVectors generated from IDF transformation that look like:
user='1234', idf=SparseVector(174, {0: 0.4709, 5: 0.8967, 7: 0.9625, 8: 0.9814,...})
I would like to explode this into something like:
|index|rating|user|
|0 |0.4709|1234|
|5 |0.8967|1234|
|7 |0.9625|1234|
|8 |0.9814|1234|
.
.
.
My objective is to take these index,value
tuples and perform an ALS step.
Upvotes: 0
Views: 597
Reputation: 4631
This task will require an UserDefinedFunction
:
from pyspark.sql.functions import udf, explode
from pyspark.ml.linalg import SparseVector, DenseVector
df = spark.createDataFrame([
('1234', SparseVector(174, {0: 0.4709, 5: 0.8967, 7: 0.9625, 8: 0.9814}))
]).toDF("user", "idf")
@udf("map<long, double>")
def vector_as_map(v):
if isinstance(v, SparseVector):
return dict(zip(v.indices.tolist(), v.values.tolist()))
elif isinstance(v, DenseVector):
return dict(zip(range(len(v)), v.values.tolist()))
df.select("user", explode(vector_as_map("idf")).alias("index", "rating")).show()
which would give you and expected result:
+----+-----+------+
|user|index|rating|
+----+-----+------+
|1234| 0|0.4709|
|1234| 8|0.9814|
|1234| 5|0.8967|
|1234| 7|0.9625|
+----+-----+------+
Upvotes: 4