Reputation: 18810
I have a csv with following structure:
user_id,user_name,code
0001,user_a,e-5
0001,user_a,s-N
0002,user_b,e-N
0002,user_b,t-5
I want to iterate over the file such that after processing a user before getting to the next user do some additional work based on processed user's code
. User can have multiple entries and number of entries could be from 1 to n
For example, as we process the user we keep track of first letter of the code that user has identified/mentioned. I have add it to the list. We need to make sure this list get reset after processing a specific user (its a user level).
As an example lets consider user_id 0001
, after I reach row of 0002
I want to add more rows related to user 0001
where those new rows has the code that we have not seen before:
Here is how I tried to accomplish this:
with open(os.path.expanduser('out_put_csv_file.csv'), 'w+') as data_file:
writer = csv.writer(data_file)
writer.writerow(('user_id', 'user_name', 'code'))
l_file = csv.DictReader(open('some_file_name'))
previous_user = None
current_user = None
tracker = []
for row in l_file:
current_user = row['user_id']
tracker.append(row['code'].split('-')[0])
writer.writerow([row['user_id'], row['user_name'], row['code']])
if current_user != previous_user:
for l_code in list_with_all_codes:
if l_code not in tracker:
writer.writerow([row['user_id'], row['user_name'], l_code])
tracker = []
previous_user = current_user
problem with this is that: I get following:
user_id,user_name,code
0001,user_a,e-5
0001,user_a,n
0001,user_a,t
0001,user_a,i
0001,user_a,s #don't want this
0001,user_a,s-N
0002,user_b,e-N
0002,user_b,n
0002,user_b,t # don't want this
0002,user_b,i
0002,user_b,ta-5
instead of that, I want is
user_id,user_name,code
0001,user_a,e-5
0001,user_a,n
0001,user_a,t
0001,user_a,i
0001,user_a,s-N
0002,user_b,e-N
0002,user_b,n
0002,user_b,i
0002,user_b,ta-5
What am I doing wrong here? Whats the best way to accomplish this?
Upvotes: 0
Views: 105
Reputation: 77347
Your problem is that you write one line of data for the new user before realizing that you need to fill in for the old user... then write old user data using the new user name.
Since you want to write multiple bits of data about the previous user, you'll need to keep his/her entire row. When you see a new user, write the data for the old user (using his info) before you do anything else. There is a special case for the first user when there isn't any previous user to deal with.
import os
import csv
open('some_file_name', 'w').write("""user_id,user_name,code
0001,user_a,e-5
0001,user_a,s-N
0002,user_b,e-N
0002,user_b,t-5
""")
list_with_all_codes = ['e', 's', 'n', 't', 'a']
def set_unused_codes(writer, row, tracker):
for l_code in list_with_all_codes:
if l_code not in tracker:
writer.writerow([row['user_id'], row['user_name'], l_code])
with open(os.path.expanduser('out_put_csv_file.csv'), 'w+') as data_file:
writer = csv.writer(data_file)
writer.writerow(('user_id', 'user_name', 'code'))
l_file = csv.DictReader(open('some_file_name'))
previous_row = None
tracker = []
for row in l_file:
if not previous_row:
previous_row = row
if row['user_id'] != previous_row.get('user_id'):
set_unused_codes(writer, previous_row, tracker)
previous_row = row
tracker = []
tracker.append(row['code'].split('-')[0])
writer.writerow([row['user_id'], row['user_name'], row['code']])
set_unused_codes(writer, row, tracker)
print(open('out_put_csv_file.csv').read())
The output is...
user_id,user_name,code
0001,user_a,e-5
0001,user_a,s-N
0001,user_a,n
0001,user_a,t
0001,user_a,a
0002,user_b,e-N
0002,user_b,t-5
0002,user_b,s
0002,user_b,n
0002,user_b,a
If you don't mind what order your missing codes are written, you could use sets to speed things up by a minisculely trivial amount (did I over promote that?!)
set_of_all_codes = set(list_of_all_codes)
... the for loop ...
for code in set_of_all_codes - set(tracker):
writer.writewrow(...)
Upvotes: 1
Reputation: 25023
The common pattern when you need to see the next data unit before knowing you're done with the current one is as follows (loosely sketched after your use case)
oldname = ""
data = []
for row in input:
n,name,code = row.split(',')
if name != oldname:
if data: flush(data)
data = []
oldname = name
update(data,n,name,code)
# remember to flush the data buffer when you're done with your file
flush(data)
data
could be a list of lists, as in
def update(data, n, name, code):
if not data:
data.append(n)
data.append(name)
data.append([code])
else:
data[2].append(code)
With respect to flush
, if you don't know how to order your output (re your comment following the Q) neither do I. But it's just a matter of iterating on data[2]
and list_of_all_codes
, you've already done something alike in your original code.
Upvotes: 1