Reputation: 133
I have a big dataframe of around 30000 rows and a single column containing a json string. Each json string contains a number of variables and its value I want to break this json string down into columns of data
two rows looks like
0 {"a":"1","b":"2","c":"3"}
1 {"a" ;"4","b":"5","c":"6"}
I want to convert this into a dataframe like
a b c
1 2 3
4 5 6
Please help
Upvotes: 2
Views: 12066
Reputation: 5696
Your column values seem to have an extra number before the actual json string. So you might want strip that out first (skip to Method if that isn't the case)
One way to do that is to apply a function to the column
# constructing the df
df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])
# print(df)
json
# 0 0 {"a":"1","b":"2","c":"3"}
# 1 1 {"a" :"4","b":"5","c":"6"}
# function to remove the number
import re
def split_num(val):
p = re.compile("({.*)")
return p.search(val).group(1)
# applying the function
df['json'] = df['json'].map(lambda x: split_num(x))
print(df)
# json
# 0 {"a":"1","b":"2","c":"3"}
# 1 {"a" :"4","b":"5","c":"6"}
Method:
Once the df
is in the above format, the below will convert each row entry to a dictionary:
df['json'] = df['json'].map(lambda x: dict(eval(x)))
Then, applying pd.Series
to the column will do the job
d = df['json'].apply(pd.Series)
print(d)
# a b c
# 0 1 2 3
# 1 4 5 6
Upvotes: 1
Reputation: 1594
with open(json_file) as f:
df = pd.DataFrame(json.loads(line) for line in f)
Upvotes: 0