sjishan
sjishan

Reputation: 3652

PySpark: Cannot create dataframe from list

Hi I have a list of tuples containing a string and a numpy float 64 value. I would like to change it to spark dataframe. But I am getting errors. The list and error are show below.

enter image description here

This is my code:

schema = StructType([StructField("key", StringType(), True), StructField("value", DoubleType(), True)])

coef_df = spark.createDataFrame(coef_list, schema)

Upvotes: 0

Views: 784

Answers (1)

James Tobin
James Tobin

Reputation: 3110

As @user6910411 suggests, Spark SQL doesn't support NumPy types (yet)

Here is a slightly more simple solution for you (incorporating the comment as well)

import numpy as np

data = [
    (np.unicode('100912strategy_id'), np.float64(-2.1412)),
    (np.unicode('10exchange_ud'), np.float64(-1.2412))]

df = (sc.parallelize(data)
    .map(lambda x: (str(x[0]), float(x[1])))
    .toDF(["key","value"]))
df.show()
+-----------------+-------+
|              key|  value|
+-----------------+-------+
|100912strategy_id|-2.1412|
|    10exchange_ud|-1.2412|
+-----------------+-------+

Upvotes: 2

Related Questions