PRATHAMESH
PRATHAMESH

Reputation: 133

Convert a column of json strings into columns of data

I have a big dataframe of around 30000 rows and a single column containing a json string. Each json string contains a number of variables and its value I want to break this json string down into columns of data

two rows looks like

0 {"a":"1","b":"2","c":"3"}
1 {"a" ;"4","b":"5","c":"6"}

I want to convert this into a dataframe like

a   b   c
1   2   3
4   5   6

Please help

Upvotes: 2

Views: 12066

Answers (2)

akilat90
akilat90

Reputation: 5696

Your column values seem to have an extra number before the actual json string. So you might want strip that out first (skip to Method if that isn't the case)

One way to do that is to apply a function to the column

# constructing the df
df = pd.DataFrame([['0 {"a":"1","b":"2","c":"3"}'],['1 {"a" :"4","b":"5","c":"6"}']], columns=['json'])

# print(df)
                         json
# 0  0 {"a":"1","b":"2","c":"3"}
# 1  1 {"a" :"4","b":"5","c":"6"}

# function to remove the number
import re

def split_num(val):
    p = re.compile("({.*)")
    return p.search(val).group(1)

# applying the function
df['json'] = df['json'].map(lambda x: split_num(x))
print(df)

#                          json
# 0   {"a":"1","b":"2","c":"3"}
# 1  {"a" :"4","b":"5","c":"6"}

Method:

Once the df is in the above format, the below will convert each row entry to a dictionary:

df['json'] = df['json'].map(lambda x: dict(eval(x)))

Then, applying pd.Series to the column will do the job

d = df['json'].apply(pd.Series)
print(d)
#   a  b  c
# 0  1  2  3
# 1  4  5  6

Upvotes: 1

nimrodz
nimrodz

Reputation: 1594

with open(json_file) as f:
    df = pd.DataFrame(json.loads(line) for line in f)

Upvotes: 0

Related Questions