Reputation: 3287
I have a list that is generated by a function. when I execute print
on my list:
print(preds_labels)
I obtain:
[(0.,8.),(0.,13.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,19.),(0.,20.),(0.,21.),(0.,23.)]
but when I want to create a DataFrame
with this command:
df = sqlContext.createDataFrame(preds_labels, ["prediction", "label"])
I get an error message:
not supported type: type 'numpy.float64'
If I create the list manually, I have no problem. Do you have an idea?
Upvotes: 7
Views: 11303
Reputation: 63
To anyone arriving here with the error:
typeerror not supported type class 'numpy.str_'
This is true for string as well. So if you created your list strings using numpy , try to change it to pure python. Create list of single item repeated N times
Upvotes: 1
Reputation: 2794
pyspark uses its own type system and unfortunately it doesn't deal with numpy well. It works with python types though. So you could manually convert the numpy.float64
to float
like
df = sqlContext.createDataFrame(
[(float(tup[0]), float(tup[1]) for tup in preds_labels],
["prediction", "label"]
)
Note pyspark will then take them as pyspark.sql.types.DoubleType
Upvotes: 17