Reputation: 41
I'm using pandas to convert multiple json files into a dataframe. I only want some entries that match some criteria from those files, but I'm appending the whole converted files, then filtering it.
Suppose I have 2 json files that look like this:
File 1500.json
[
{
"CodStore": 1500,
"CodItem": 10,
"NameItem": "Burger",
"Price": 10.0
},
{
"CodStore": 1500,
"CodItem": 20,
"NameItem": "Fries",
"Price": 3.0
},
{
"CodStore": 1500,
"CodItem": 30,
"NameItem": "Ice Cream",
"Price": 1.0
}
]
File 1805.json
[
{
"CodStore": 1805,
"CodItem": 10,
"NameItem": "Burger",
"Price": 9.0
},
{
"CodStore": 1805,
"CodItem": 20,
"NameItem": "Fries",
"Price": 2.0
},
{
"CodStore": 1805,
"CodItem": 30,
"NameItem": "Ice Cream",
"Price": 0.5
}
]
I only want entries with CodItem 10 and 30 on my dataframe, so my python code looks like this:
from pandas import DataFrame, read_json
df = DataFrame()
stores = [1500, 1805]
for store in stores:
filename = '%s.json' % store
df = df.append(read_json(filename))
df = df[(df.CodItem == 10) | (df.CodItem == 30)]
This is just an example, the problem is that I have more than 600+ json files so reading takes a lot of time, the dataframe becomes very long and memory consumption is very high.
Is there a way to read only the matching criteria to the dataframe?
Upvotes: 1
Views: 2114
Reputation: 402333
One option would be to append your JSON data to a list, then convert once at the end and filter.
coditems = [10, 30]
data = []
for filename in json_files:
data.extend(read_json(filename))
df = pd.DataFrame(data).query('CodItem in @coditems')
This should be a lot faster because append
is a quadratic operation. You have to read all the data in anyway, so you may as well use pandas to speed it up.
Another option would be to initialise your DataFrames inside a loop and then call pd.concat
after you're done.
df_list = []
for file in json_files:
df_list.append(pd.DataFrame.from_records(read_json(filename)))
df = pd.concat(df_list, ignore_index=True).query('CodItem in @coditems')
Upvotes: 2
Reputation: 8269
You can can create a temporary data frame within your loop and filter it before appending:
from pandas import DataFrame, read_json
df = DataFrame()
stores = [1500, 1805]
for store in stores:
filename = '%s.json' % store
temp_df = read_json(filename)
df = df.append(temp_df[(temp_df.CodItem == 10) | (temp_df.CodItem == 30)])
Upvotes: 0