Reputation: 87
I'm working on a method to read data from input.csv
, and update stock
column in output.csv
based on the product's id
These are the steps I'm working on right now:
1. Read product info from input.csv
into input_data = []
, which will return a list of OrderedDict.
input_data
currently looks like this:
[OrderedDict([('id', '1'), ('name', 'a'), ('stock', '33')]),
OrderedDict([('id', '2'), ('name', 'b'), ('stock', '66')]), OrderedDict([('id', '3'), ('name', 'c'), ('stock', '99')])]
2. Read current product info from output.csv
into output_data = []
, which has the same schema as input_data
3. Iterate through input_data
and update the stock
column in output_data
based on stock info in input_data
. What's the best way to do this?
-> An important mention is that in input_data
there might be some IDs which exist in input_data
but do not exist in output_data
. I would like to update the stocks for id
s common to input_data
and output_data
, and the "new" id
s would most likely be written to a new csv.
I was thinking of something like (this isn't real code):
for p in input_data:
# check if p['id'] exists in the list of output_data IDs (I might have to create a list of IDs in output_data for this as well, in order to check it against input_data IDs
# if p['id'] exists in output_data, write the Stock to the corresponding product in output_data
# else, append p to another_csv
I know this looks pretty messy, what I'm asking is for a logical way to approach this mission without wasting too much compute time. The files in question are going to be 100,000 rows long, probably, so performance and speed will be an issue.
If my data from input_data
and output_data
are a list
of OrderedDict
, what is the best way to check the id
in input_data
and write the stock
to the product with the exact same id
in output_data
?
Upvotes: 2
Views: 43
Reputation: 2155
While Python might not be your best option, I wouldn't use lists of OrderDict for this task. This is simply because trying to change something within output_data
would require O(n) complexity which will simply transform your script in O(n**2).
I would save the two files in dicts (or OrderedDicts if you care about order), like this (and reduce the complexity of the whole thing to O(n)):
input_data = {
'1': ['a', '33'],
'2': ['b', '66'],
'3': ['c', '99']
}
output_data = {
'1': ['a', '31'],
'3': ['c', '95']
}
# iterate through all keys in input_data and update output_data
# if a key does not exist in output_data, create it in a different dict
new_data = {}
for key in input_data:
if key not in output_data:
new_data[key] = input_data[key]
# for optimisation's sake you could append data into the new file here
# and not save into a new dict
else:
output_data[key][1] = input_data[key][1]
# for optimisation's sake you could append data into a new output file here
# and rename/move the new output file into the old output file after the script finishes
Upvotes: 2