Vash
Vash

Reputation: 141

Code swap. How would I swap the value of one CSV file column to another?

I have two CSV files. The first file(state_abbreviations.csv) has only states abbreviations and their full state names side by side(like the image below), the second file(test.csv) has the state abbreviations with additional info.

I want to replace each state abbreviation in test.csv with its associated state full name from the first file.

My approach was to read reach file, built a dict of the first file(state_abbreviations.csv). Read the second file(test.csv), then compare if an abbreviation matches the first file, if so replace it with the full name.

Any help is appreacited

import csv


state_initials = ("state_abbr")
state_names = ("state_name")
state_file = open("state_abbreviations.csv","r")
state_reader = csv.reader(state_file)

headers = None
final_state_initial= []
for row in state_reader:
    if not headers:
        headers = []
        for i, col in enumerate(row):
           if col in state_initials:
            headers.append(i)
    else:
        final_state_initial.append((row[0]))

print final_state_initial


headers = None
final_state_abbre= []
for row in state_reader:
    if not headers:
        headers = []
        for i, col in enumerate(row):
           if col in state_initials:
            headers.append(i)
    else:
        final_state_abbre.append((row[1]))

print final_state_abbre

final_state_initial
final_state_abbre
state_dictionary = dict(zip(final_state_initial, final_state_abbre))
print state_dictionary

enter image description here enter image description here

Upvotes: 0

Views: 1324

Answers (3)

zwer
zwer

Reputation: 25799

You almost got it, the approach that is - building out a dict out of the abbreviations is the easiest way to do this:

with open("state_abbreviations.csv", "r") as f:
    # you can use csv.DictReader() instead but lets strive for performance
    reader = csv.reader(f)
    next(reader)  # skip the header
    # assuming the first column holds the abbreviation, second the full state name
    state_map = {state[0]: state[1] for state in reader}

Now you have state_map containing a map of all your state abbreviations, for example: state_map["FL"] contains Florida.

To replace the values in your test.csv, tho, you'll either have to load the whole file into memory, parse it, do the replacement and save it, or create a temporary file and stream-write to it the changes, then overwrite the original file with the temporary file. Assuming that test.csv is not too big to fit into your memory, the first approach is much simpler:

with open("test.csv", "r+U") as f:  # open the file in read-write mode
    # again, you can use csv.DictReader() for convenience, but this is significantly faster
    reader = csv.reader(f)
    header = next(reader)  # get the header
    rows = []  # hold our rows
    if "state" in header:  # proceed only if `state` column is found in the header
        state_index = header.index("state")  # find the state column index
        for row in reader:  # read the CSV row by row
            current_state = row[state_index]  # get the abbreviated state value
            # replace the abbreviation if it exists in our state_map
            row[state_index] = state_map.get(current_state, current_state)
            rows.append(row)  # append the processed row to our `rows` list
        # now lets overwrite the file with updated data
        f.seek(0)  # seek to the file begining
        f.truncate()  # truncate the rest of the content
        writer = csv.writer(f)  # create a CSV writer
        writer.writerow(header)  # write back the header
        writer.writerows(rows)  # write our modified rows

Upvotes: 1

ExtractTable.com
ExtractTable.com

Reputation: 811

Step 1: Ask Python to remember the abbreviated full names, so we are using dictionary for that

with open('state_abbreviations.csv', 'r') as f:
    csvreader = csv.reader(f)
    next(csvreader)
    abs = {r[0]: r[1] for r in csvreader}

step 2: Replace the abbreviations with full names and write to an output, I used "test_output.csv"

with open('test.csv', 'r') as reading:
    csvreader = csv.reader(reading)
    next(csvreader)
    header = ['name', 'gender', 'birthdate', 'address', 'city', 'state']
    with open( 'test_output.csv', 'w' ) as f:
        writer = csv.writer(f)
        writer.writerow(header)
        for a in csvreader:
            writer.writerow(a[0], a[1], a[2], a[3], a[4], abs[a[5]])

Upvotes: 1

Invibsid
Invibsid

Reputation: 71

It seems like you are trying to go through the file twice? This is absolutely not necessary: the first time you go through you are already reading all the lines, so you can then create your dictionary items directly.

In addition, comprehension can be very useful when creating lists or dictionaries. In this case it might be a bit less readable though. The alternative would be to create an empty dictionary, start a "real" for-loop and adding all the key:value pairs manually. (i.e: with state_dict[row[abbr]] = row[name])

Finally, I used the with statement when opening the file to ensure it is safely closed when we're done with it. This is good practice when opening files.

import csv

with open("state_abbreviations.csv") as state_file:
    state_reader = csv.DictReader(state_file)
    state_dict = {row['state_abbr']: row['state_name'] for row in state_reader}

print(state_dict)

Edit: note that, like the code you showed, this only creates the dictionary that maps abbreviations to state names. Actually replacing them in the second file would be the next step.

Upvotes: 1

Related Questions