Dhanu
Dhanu

Reputation: 1

Map JSON string to struct in PySpark

I have a JSON in string as follows:

'''{"col1":"value1", "col2":"[{'col3':'val3'},{'col3':'val4'}]"}'''

I want to convert it as:

{"col1":"value1",
 "col2":[ {'col3':'val3'}, {'col3':'val4'}]}

And I want to read this in the PySpark dataframe. how to convert the list inside string to json struct?

Upvotes: 0

Views: 901

Answers (1)

Torxed
Torxed

Reputation: 23500

The (whole) data is not a JSON-string. Namely because ' characters are not allowed in JSON structures. The best option would be to go back to wherever this is generated and correct the malformed data before going onwards.

Once you have corrected the bad data, you can do:

import json
result = json.loads('''{"col1":"value1", "col2":[{"col3":"val3"},{"col3":"val4"}]}''')

If you can't change how the data is given to you. One solution would be to string-replace the bad characters (but this might cause all sorts of trouble along the way):

import json
result = json.loads('''{"col1":"value1", "col2":"[{'col3':'val3'},{'col3':'val4'}]"}''')
result['col2'] = json.loads(result['col2'].replace("'", '"'))

Either way, I would go back and re-work the way you get the data for the most reliable results. But that is not JSON-data as it stands now. At least not in the sense you think it is.

Upvotes: 1

Related Questions