J. Doe
J. Doe

Reputation: 156

How to Manipulate Data in csv file columns using Pandas?

My data contains three columns, with one of them being a city location. I want to add a new column to my .csv file that adds a zip code corresponding to the city in the city column. I don't know how to implement a conditional expression that achieves this. Currently, I am using Pandas to read in my csv file, and I understand how to add a column to my csv file, but I can't find any information on how to manipulate the data in the way described above.

Edit:

My data looks something like this (about a million rows down). I have another csv file with two columns, one containing a state and a corresponding zip code. I want to match the state in the below csv file to the zip code in another csv file. My dataset also does not contain that many states so I was thinking I could take advantage of a simple conditional expression or case statement that sets a new column equal to a certain zip code if a certain state is in that row.

enter image description here

Upvotes: 0

Views: 844

Answers (1)

grge
grge

Reputation: 145

I'm assuming that the other csv file has columns "STATE" and "ZIPCODE", and it contains exactly one row per state, and has state abbreviations in the same format as the large csv file.

You want to do a "merge" operation on the state column. Specifying the "left join" style merge will ensure that every row of your original dataset is preserved, even if you don't have the corresponding zipcode.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

https://pandas.pydata.org/pandas-docs/stable/merging.html

This should do what you need:

big_df = pandas.read_csv("big_file.csv")
zipcodes_df = pandas.read_csv("zipcodes.csv")
merged_df = big_df.merge(zipcode_df, on='STATE', how='left')
merged_df.to_csv("merged_big_file.csv", index=False)

Upvotes: 2

Related Questions