Python iterate over csv row by row while keeping track of content of next row

Question

I have a csv with following structure:

user_id,user_name,code
0001,user_a,e-5
0001,user_a,s-N
0002,user_b,e-N
0002,user_b,t-5

I want to iterate over the file such that after processing a user before getting to the next user do some additional work based on processed user's code. User can have multiple entries and number of entries could be from 1 to n

For example, as we process the user we keep track of first letter of the code that user has identified/mentioned. I have add it to the list. We need to make sure this list get reset after processing a specific user (its a user level).

As an example lets consider user_id 0001, after I reach row of 0002 I want to add more rows related to user 0001 where those new rows has the code that we have not seen before:

Here is how I tried to accomplish this:

    with open(os.path.expanduser('out_put_csv_file.csv'), 'w+') as data_file:
        writer = csv.writer(data_file)
        writer.writerow(('user_id', 'user_name', 'code'))
        l_file = csv.DictReader(open('some_file_name'))
        previous_user = None
        current_user = None
        tracker = []
        for row in l_file:
            current_user = row['user_id']
            tracker.append(row['code'].split('-')[0])
            writer.writerow([row['user_id'], row['user_name'], row['code']])
            if current_user != previous_user:
                for l_code in list_with_all_codes:
                        if l_code not in tracker:
                               writer.writerow([row['user_id'], row['user_name'], l_code])
                tracker = []
            previous_user = current_user

problem with this is that: I get following:

user_id,user_name,code
0001,user_a,e-5
0001,user_a,n
0001,user_a,t
0001,user_a,i
0001,user_a,s #don't want this
0001,user_a,s-N
0002,user_b,e-N
0002,user_b,n
0002,user_b,t # don't want this
0002,user_b,i
0002,user_b,ta-5

instead of that, I want is

    user_id,user_name,code
    0001,user_a,e-5
    0001,user_a,n
    0001,user_a,t
    0001,user_a,i
    0001,user_a,s-N
    0002,user_b,e-N
    0002,user_b,n
    0002,user_b,i
    0002,user_b,ta-5

What am I doing wrong here? Whats the best way to accomplish this?

tdelaney · Accepted Answer

Your problem is that you write one line of data for the new user before realizing that you need to fill in for the old user... then write old user data using the new user name.

Since you want to write multiple bits of data about the previous user, you'll need to keep his/her entire row. When you see a new user, write the data for the old user (using his info) before you do anything else. There is a special case for the first user when there isn't any previous user to deal with.

import os
import csv

open('some_file_name', 'w').write("""user_id,user_name,code
0001,user_a,e-5
0001,user_a,s-N
0002,user_b,e-N
0002,user_b,t-5
""")

list_with_all_codes = ['e', 's', 'n', 't', 'a']

def set_unused_codes(writer, row, tracker):
    for l_code in list_with_all_codes:
        if l_code not in tracker:
            writer.writerow([row['user_id'], row['user_name'], l_code])

with open(os.path.expanduser('out_put_csv_file.csv'), 'w+') as data_file:
    writer = csv.writer(data_file)
    writer.writerow(('user_id', 'user_name', 'code'))
    l_file = csv.DictReader(open('some_file_name'))
    previous_row = None
    tracker = []
    for row in l_file:
        if not previous_row:
            previous_row = row
        if row['user_id'] != previous_row.get('user_id'):
            set_unused_codes(writer, previous_row, tracker)
            previous_row = row
            tracker = []
        tracker.append(row['code'].split('-')[0])
        writer.writerow([row['user_id'], row['user_name'], row['code']])
    set_unused_codes(writer, row, tracker)

print(open('out_put_csv_file.csv').read())

The output is...

user_id,user_name,code
0001,user_a,e-5
0001,user_a,s-N
0001,user_a,n
0001,user_a,t
0001,user_a,a
0002,user_b,e-N
0002,user_b,t-5
0002,user_b,s
0002,user_b,n
0002,user_b,a

If you don't mind what order your missing codes are written, you could use sets to speed things up by a minisculely trivial amount (did I over promote that?!)

set_of_all_codes = set(list_of_all_codes)
... the for loop ...

    for code in set_of_all_codes - set(tracker):
        writer.writewrow(...)

Python iterate over csv row by row while keeping track of content of next row

Answers (2)

Related Questions