Reputation: 1944
I have sparsevector in pyspark which looks like this
SparseVector(5,{1:5,2:3,3:5,4:3,5:2})
How can I convert it to pandas dataframe with two columns which loks like this
ID VALUE
1 5
2 3
3 5
4 3
5 2
I tried sparsevector.zipWithIndex() but it did not work
Upvotes: 1
Views: 5643
Reputation: 43504
Your example array is malformed, as you've specified 5 levels so there can not be an index 5. After you fix that issue, you can simply call toArray()
which will return a numpy.ndarray
. Just pass that into the constructor for a pandas.DataFrame
.
from pyspark.mllib.linalg import SparseVector # code works the same
#from pyspark.ml.linalg import SparseVector # code works the same
import pandas as pd
a = SparseVector(5,{0:5,1:3,2:5,3:3,4:2}) # note the index starts at 0
df = pd.DataFrame(a.toArray())
print(df)
# 0
#0 5.0
#1 3.0
#2 5.0
#3 3.0
#4 2.0
The code works the same whether you're working with pyspark.mllib.linalg.SparseVector
or pyspark.ml.linalg.SparseVector
.
Upvotes: 3