Reputation: 551
I have multiple csv files that I read into individual data frames based on their name in the directory, like so
# ask user for path
path = input('Enter the path for the csv files: ')
os.chdir(path)
# loop over filenames and read into individual dataframes
for fname in os.listdir(path):
if fname.endswith('Demo.csv'):
demoRaw = pd.read_csv(fname, encoding = 'utf-8')
if fname.endswith('Key2.csv'):
keyRaw = pd.read_csv(fname, encoding = 'utf-8')
Then I filter to only keep certain columns
# filter to keep desired columns only
demo = demoRaw.filter(['Key', 'Sex', 'Race', 'Age'], axis=1)
key = keyRaw.filter(['Key', 'Key', 'Age'], axis=1)
Then I create a list of the above dataframes and use reduce to merge them on Key
# create list of data frames for combined sheet
dfs = [demo, key]
# merge the list of data frames on the Key
combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs)
Then I drop the auto generated column, create an Excel writer and write to a csv
# drop the auto generated index colulmn
combined.set_index('RecordKey', inplace=True)
# create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('final.xlsx', engine='xlsxwriter')
# write to csv
combined.to_excel(writer, sheet_name='Combined')
meds.to_excel(writer, sheet_name='Meds')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
The problem is some files have keys that aren't in others. For example
Demo file
Key Sex Race Age
1 M W 52
2 F B 25
3 M L 78
Key file
Key Key2 Age
1 7325 52
2 4783 25
3 1367 78
4 9435 21
5 7247 65
Right now, it will only include rows if there is a matching key in each (in other words it just leaves out the rows with keys not in the other files). How can I combine all rows from all files, even if keys don't match? So the end result will look like this
Key Sex Race Age Key2 Age
1 M W 52 7325 52
2 F B 25 4783 25
3 M L 78 1367 78
4 9435 21
5 7247 65
I don't care if the empty cells are blanks, NaN, #N/A, etc. Just as long as I can identify them.
Upvotes: 1
Views: 142
Reputation: 146
Replace combined = reduce(lambda left,right: pd.merge(left,right,on='Key'), dfs)
With: combined=pd.merge(demo,key, how='outer', on='Key')
You will have to specificy the 'outer' to join both the full table of Key and Demo
Upvotes: 1