Reputation: 179
I am trying to take an rdd that looks like:
[<1x24000 sparse matrix of type '' with 10 stored elements in Compressed Sparse Row format>, . . . ]
and ideally turn it into a dataframe that looks like:
<code>
+-----------------+
| A | B | C |
+-----------------+
| 1.0 | 0.0 | 0.0 |
+-----+-----+-----+
| 1.0 | 1.0 | 0.0 |
+-----+-----+-----+
</code>
However, I keep getting this:
<code>
+---------------+
| _1|
+---------------+
|[1.0, 0.0, 0.0]|
+---------------+
|[1.0, 1.0, 0.0]|
+---------------+
</code>
I am having the darnedest time because each row is filled with numpy arrays.
I used this code to create the dataframe from the rdd:
<code>res.flatMap(lambda x: np.array(x.todense())).map(list).map(lambda l : Row([float(x) for x in l])).toDF()</code>
**Explode does not help (it puts everything into the same column)
** I tried using a UDF on the resulting dataframe but I cannot seem to separate the numpy array into individual values.
Please help!
Upvotes: 0
Views: 1752