Reputation:
Let's say I have file1.csv:
OU Mailfile
CORP:Jenny Smith: mail/246802.nsf
"CORP:John Smith:,John Smith:" mail/123456.nsf
STORE:Mary Poppins: mail/789012.nsf
STORE:Tony Stark: mail/345678.nsf
STORE:Carmen Sandiego: mail/901234.nsf
NEWS:Peter Parker: mail/567890.nsf
NEWS:Clark Kent: mail/654321.nsf
STORES:store123 mail/369121.nsf
Then file2.csv:
OU
CORP
STORE
NEWS
For every line in file2.csv that has 'CORP', 'STORE', or 'NEWS', I want to search through file1.csv and create a file, such as STOREall.csv, CORPall.csv, and NEWSall.csv.
So a file like STOREall.csv should have:
OU Mailfile
STORE:Mary Poppins: mail/789012.nsf
STORE:Tony Stark: mail/345678.nsf
STORE:Carmen Sandiego: mail/901234.nsf
STORES:store123 mail/369121.nsf
CORPall.csv:
OU Mailfile
CORP:Jenny Smith: mail/246802.nsf
CORP:John Smith:,John Smith: mail/123456.nsf
Then NEWSall.csv
OU Mailfile
NEWS:Peter Parker: mail/567890.nsf
NEWS:Clark Kent: mail/654321.nsf
If I can also have it with just OU column and not Mailfile column, that would also be good. But I think I can just do a usecols=['OU'] for that.
I added more rows that I bumped into on the way:
OU Mailfile
CORP:Jenny Smith: mail/246802.nsf
"CORP:John Smith:,John Smith:" mail/123456.nsf
STORE:Mary Poppins: mail/789012.nsf
STORE:Tony Stark: mail/345678.nsf
STORE:Carmen Sandiego: mail/901234.nsf
NEWS:Peter Parker: mail/567890.nsf
NEWS:Clark Kent: mail/654321.nsf
STORES:store123 mail/369121.nsf
NEWS:CORPmanager mail/137112.nsf
NEWS:STOREmanager mail/083561.nsf
I'm getting these outputs with the last two NEWS rows since there is corp in it too:
OU Mailfile
CORP:Jenny Smith: mail/246802.nsf
CORP:John Smith:,John Smith: mail/123456.nsf
NEWS:CORPmanager mail/137112.nsf
Upvotes: 1
Views: 84
Reputation: 9867
You could read both files into dataframes, loop through the values in the 'OU' column of file2.csv, filter the data from file1.csv and save as individual CSV files.
import pandas as pd
data = pd.read_csv('file1.csv')
depts = pd.read_csv('file2.csv')
for dept in depts['OU']:
df_dept = data[data['OU'].str.contains(dept)]
df_dept['OU'].to_csv(f'{dept}all.csv', index=False)
Upvotes: 2