Reputation: 167
I need to read in a JSON file into pandas df. JSON data looks like this:
{"f0_":{"id":"138307057680","ActionName":"Complete","Time":"2020-04-23-12:40:04"}}
{"f0_":{"id":"138313115245","ActionName":"Midpoint","Time":"2020-06-16-20:41:16"}}
I need to get rid of the first key that holds all columns in it. I tried:
import json
import pandas as pd
from pandas.io.json import json_normalize
data_pd = pd.read_json('db/my_file.json', lines=True)
new_data = json_normalize(data_pd)
The error message is: AttributeError: 'str' object has no attribute 'values'
The desired output is:
id ActionName Time
138307057680 Complete 2020-04-23-12:40:04
138313115245 Midpoint 2020-06-16-20:41:16
Currently if I print out data_pd
, the output is:
Upvotes: 1
Views: 55
Reputation: 9267
You can clean the data passed to Pandas
before generating your dataframe like this example:
import json
import pandas as pd
def gen_data(file_path):
with open(file_path) as f:
for line in f.readlines():
if line:
line = json.loads(line)
for value in line.values():
yield value
df = pd.DataFrame(gen_data('db/my_file.json'))
print(df)
Output:
id ActionName Time
0 138307057680 Complete 2020-04-23-12:40:04
1 138313115245 Midpoint 2020-06-16-20:41:16
Bonus:
A bit of speed comparaison (I'm using an i7):
If you Clean your data first then generate your DF:
>> %timeit pd.DataFrame(gen_data('db/my_file.json'))
519 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
And if you generate your DF then clean it:
import pandas as pd
def gen_df_method2(file_path):
data_pd = pd.read_json(file_path, lines=True)
return pd.DataFrame(data_pd['f0_'].values.tolist())
>> %timeit gen_df_method2('db/my_file.json')
2.66 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 1
Reputation: 150815
You can try with:
new_data = pd.DataFrame(data_pd['f0_'].values.tolist())
Output:
id ActionName Time
0 138307057680 Complete 2020-04-23-12:40:04
1 138313115245 Midpoint 2020-06-16-20:41:16
Upvotes: 1