Reputation: 23
I have Dataframe user_recommended
as shown in picture. The recommendations
column is a PySpark RDD of like shown below:
In[10]: user_recommended.recommendations[0]
Out[10]: [Row(item=0, rating=0.005226806737482548),
Row(item=23, rating=0.0044402251951396465),
Row(item=4, rating=0.004139747936278582)]
I want to convert recommendations
RDD to Python List.
Is there a script that can help me to convert recommendations
column in user_recommended
Dataframe (note that it is of type pandas.core.frame.DataFrame
) to a list.
Upvotes: 0
Views: 204
Reputation: 2696
Another, slightly different approach. The value of this is, in my view, is that it would generalise more easily to Rows
with more than 2 elements. Also, it is worth noting that the data structure that you preview in your question is a Pandas DF with a column consisting of lists of PySpark Row
data structures and is not in fact an RDD.
from pyspark.sql import Row
# recreate the individual entries of the recommendation column
# these are lists of pyspark Row data structures
df_recommend = pd.DataFrame({'recommendations': (
[Row(item=0, rating=0.005226806737482548),
Row(item=23, rating=0.0044402251951396465),
Row(item=4, rating=0.004139747936278582)],)})
# now extract the values using the asDict method of the Row
df_recommend['extracted_values'] = (
df_recommend['recommendations']
.apply(lambda recs: [list(x.asDict().values()) for x in recs])
)
Upvotes: 0
Reputation: 1464
I suppose you want to do this
from pyspark.sql import Row
my_rdd = sc.parallelize([Row(item=0, rating=0.005226806737482548),
Row(item=23, rating=0.0044402251951396465),
Row(item=4, rating=0.004139747936278582)])
my_rdd.collect()
new_rdd = my_rdd.map(lambda x: (x[0], x[1]))
new_rdd.collect()
Upvotes: 1