Reputation: 309
I have a database with one nested column. I need to split each value inside this nested cells. So my idea is to take this nested string and transform in json with json.loads
then use json_normalize
to create a dataframe with this nested values but my data has some values that are causing me some problems, like this:
df['column'][1]:
"[{'id': 124, 'status': 'incomplete', 'formation': 'asddas', 'institution': 'asdasd', 'course': 'asdsads', 'startMonth': 7, 'startYear': 2018, 'endMonth': xx, 'endYear': None}, {'id': 7654414, 'status': 'complete', 'formation': 'technological', 'institution': 'asdasdada', 'course': 'asdasdasdsa', 'startMonth': 1, 'startYear': 2016, 'endMonth': 6, 'endYear': 2018}]"
df['column'][2]:
'[{"institution":"asdadsad","startMonth":2,"startYear":1992,"course":"prop e.mkt e adm empresas","id":11111,"formation":"asdasd","endMonth":12,"endYear":1996,"status":"complete"},{"institution":"Universidade Sant\'Anna","startMonth":2,"startYear":1998,"course":"asdasdasd","id":2007186,"formation":"asdasda","endMonth":12,"endYear":2002,"status":"complete"}]'
As you can see, the first cell has single quotes, so I used df['column'].str.replace("'", '"')
to put double quotes then I used json.loads
to transform to json and it worked.
But after the transformation the single quote in the second line
({"institution":"Universidade Sant\'Anna")
became
(Sant"Anna)
so the json.loads
raises a error.
df['column'][2]: (After transformation)
'[{"institution":"asdadsad","startMonth":2,"startYear":1992,"course":"prop e.mkt e adm empresas","id":11111,"formation":"asdasd","endMonth":12,"endYear":1996,"status":"complete"},{"institution":"Universidade Sant"Anna","startMonth":2,"startYear":1998,"course":"asdasdasd","id":2007186,"formation":"asdasda","endMonth":12,"endYear":2002,"status":"complete"}]'
json.loads(df['column'])
Error
JSONDecodeError: Expecting ',' delimiter: line 1 column 247 (char 246)
I tried a lot of transformations to solve the problem but I wasn't able to find a solution that solve the two situations in a row.
Upvotes: 0
Views: 60
Reputation: 5648
Because you don't have to do any transformations to rows with double quotes, apply a lambda function to test the beginning of the string.
If I set your first example to s
, I can test s.startswith("[{'")
to look for a single quote. This returns True
something like this (untested but should be close). It'll return a transformed string only if it starts with a single quote.
df['column'].apply(lambda x: x.replace("'", '"') if x.startswith("[{'") else x)
Upvotes: 1