user17841968
user17841968

Reputation:

Comparing data in two CSVs and creating a third CSV for matching data

Let's say I have file1.csv:

OU                                   Mailfile   
CORP:Jenny Smith:                    mail/246802.nsf
"CORP:John Smith:,John Smith:"       mail/123456.nsf
STORE:Mary Poppins:                  mail/789012.nsf
STORE:Tony Stark:                    mail/345678.nsf
STORE:Carmen Sandiego:               mail/901234.nsf
NEWS:Peter Parker:                   mail/567890.nsf
NEWS:Clark Kent:                     mail/654321.nsf
STORES:store123                      mail/369121.nsf

Then file2.csv:

OU                      
CORP
STORE     
NEWS

For every line in file2.csv that has 'CORP', 'STORE', or 'NEWS', I want to search through file1.csv and create a file, such as STOREall.csv, CORPall.csv, and NEWSall.csv.

So a file like STOREall.csv should have:

OU                                   Mailfile
STORE:Mary Poppins:                  mail/789012.nsf
STORE:Tony Stark:                    mail/345678.nsf
STORE:Carmen Sandiego:               mail/901234.nsf
STORES:store123                      mail/369121.nsf

CORPall.csv:

OU                                   Mailfile
CORP:Jenny Smith:                    mail/246802.nsf
CORP:John Smith:,John Smith:         mail/123456.nsf

Then NEWSall.csv

OU                                   Mailfile
NEWS:Peter Parker:                   mail/567890.nsf
NEWS:Clark Kent:                     mail/654321.nsf

If I can also have it with just OU column and not Mailfile column, that would also be good. But I think I can just do a usecols=['OU'] for that.


I added more rows that I bumped into on the way:

OU                                   Mailfile   
CORP:Jenny Smith:                    mail/246802.nsf
"CORP:John Smith:,John Smith:"       mail/123456.nsf
STORE:Mary Poppins:                  mail/789012.nsf
STORE:Tony Stark:                    mail/345678.nsf
STORE:Carmen Sandiego:               mail/901234.nsf
NEWS:Peter Parker:                   mail/567890.nsf
NEWS:Clark Kent:                     mail/654321.nsf
STORES:store123                      mail/369121.nsf
NEWS:CORPmanager                     mail/137112.nsf
NEWS:STOREmanager                    mail/083561.nsf

I'm getting these outputs with the last two NEWS rows since there is corp in it too:

OU                                   Mailfile
CORP:Jenny Smith:                    mail/246802.nsf
CORP:John Smith:,John Smith:         mail/123456.nsf
NEWS:CORPmanager                     mail/137112.nsf

Upvotes: 1

Views: 84

Answers (1)

norie
norie

Reputation: 9867

You could read both files into dataframes, loop through the values in the 'OU' column of file2.csv, filter the data from file1.csv and save as individual CSV files.

import pandas as pd

data = pd.read_csv('file1.csv')

depts = pd.read_csv('file2.csv')

for dept in depts['OU']:
    df_dept = data[data['OU'].str.contains(dept)]
    df_dept['OU'].to_csv(f'{dept}all.csv', index=False)

Upvotes: 2

Related Questions