How to transfer the string to a dict with pysparkSQL

Question

In pysparkSQL, I have a DataFrame called bmd2 like this:

DataFrame[genres: string, id: int, tagline: string, title: string, vote_average: double, vote_count: int]

And the data bmd2['genres'] goes like this:

bmd2.select('genres').show():

+--------------------+
|              genres|
+--------------------+
|[{'id': 16, 'name...|
|[{'id': 12, 'name...|
|[{'id': 10749, 'n...|
|[{'id': 35, 'name...|
|[{'id': 35, 'name...|
|[{'id': 28, 'name...|
|[{'id': 35, 'name...|
|[{'id': 28, 'name...|
|[{'id': 28, 'name...|
|[{'id': 12, 'name...|
|[{'id': 35, 'name...|
|[{'id': 35, 'name...|
|[{'id': 10751, 'n...|
|[{'id': 36, 'name...|
|[{'id': 28, 'name...|
|[{'id': 18, 'name...|
|[{'id': 18, 'name...|
|[{'id': 80, 'name...|
|[{'id': 80, 'name...|
|[{'id': 28, 'name...|
+--------------------+
only showing top 20 rows

The type of data in column 'genres' are string, but they could be transfer to a list of dicts with 'eval function' in python. So how should I apply the eval() here to transfer the string here to list in every row?I tried many ways:

bmd2.select('genres'.astype('list')):AttributeError: 'str' object has no attribute 'astype'

bmd2.select(eval('genres')):NameError: name 'genres' is not defined

bmd2.withColumn('genres',eval('genres')):NameError: name 'genres' is not defined

iPrince · Accepted Answer

I solved my question by using UDF, which is User-defined function.

First,import it:

from pyspark.sql.functions import udf

Then, define your UDF, just like an anonymous function:

getdirector = udf(lambda x:[i['name'] for i in x if i['job'] == 'Director'],StringType())

You should assign the type of return value here, so you will get a return value with your expected type. Then you can call this UDF in your code like other functions.

cres2 = cres1.select('id',getcharacter('cast').alias('cast'),getdirector('crew').alias('crew'))

In this problem, I can modify the UDF to get any type I need.

How to transfer the string to a dict with pysparkSQL

Answers (2)

Related Questions