Moritz
Moritz

Reputation: 563

How can I create a dataframe from this dict

I know this is a rather basic question, but I for my life I cant get it to work and I am running out of time, so: I have a dict that looks like this

data_dict = {'timestamp': '2019-05-01T06:00:00-04:00', 'data': [0.37948282157787916, 1.5890471705541012, 2.1883813840381885], '_id': '62377385587e549976adfda0'}

How can I create a dataframe from it? I tried:

schema = StructType([
  StructField('timestamp', TimestampType(), True),
  StructField('data', ArrayType(DecimalType()), True),
  StructField('_id', StringType(), True)
  ])
df = spark.createDataFrame(data=data_dict, schema=schema)

this gives me error:

TypeError: StructType can not accept object 'timestamp' in type <class 'str'>

But even when I shrink the dict and take out the timestamp from the dict and the schema, I get a similar error:

TypeError: StructType can not accept object 'data' in type <class 'str'>

Any help greatly appreciated, thanks a lot in advance!

Edit: I just figured out, that by just putting [] around the dict, I get it to work. However, if anyone has a less ugly solution, Ill buy it

Upvotes: 0

Views: 190

Answers (1)

Frau P
Frau P

Reputation: 144

You can cast the columns to desired type once you have the data in the df and then if needed explode the data column further to spread the values of the arrays in columns.

data_dict = {'timestamp': '2019-05-01T06:00:00-04:00', 'data': [0.37948282157787916, 1.5890471705541012, 2.1883813840381885], '_id': '62377385587e549976adfda0'}
df=spark.createDataFrame([data_dict]).select('_id',explode('data').alias('data'),col('timestamp').cast(TimestampType()))
_id data timestamp
62377385587e549976adfda0 0.37948282157787916 2019-05-01T06:00:00-04:00
62377385587e549976adfda0 1.5890471705541012 2019-05-01T06:00:00-04:00
62377385587e549976adfda0 2.1883813840381885 2019-05-01T06:00:00-04:00

Upvotes: 1

Related Questions