Danny Kofi-Armah
Danny Kofi-Armah

Reputation: 1

Merge multiple Excel files with varied rows into one Excel file in pandas

I have 4 Excel files that I have to merge into one Excel file. Demography file containing ID, Initials, Age, and Sex. Laboratory file containing ID, Initials Test name, Test date, and Test Value. Medical History containing ID, Initials, Medical condition, Start and Stop Dates. Medication given containing ID, Initials, Drug name, dose, frequency, start and stop dates.

There are 50 patients. The demography file contains all 50 rows of 50 patients. The rest of the files have 50 patients but between 100 to 400 rows because each patient has multiple lab tests or multiple drugs.

When I merge in pandas, I have duplicates or assignment of entities to wrong patients. The challenge is to do this a way such that where you have a patient with more medications given than lab tests, the lab test should replace the duplicates with whitespaces.

This is a shortened representation:

import pandas as pd 
lab = pd.read_excel('data/data.xlsx', sheetname='lab') 
drugs = pd.read_excel('data/data.xlsx', sheetname='drugs') 
merged_data = pd.merge(drugs, lab, on='ID', how='left')
merged_data.to_excel('merged_data.xls')

You get this result: Pandas merge result

I would prefer this result: Prefered output

Upvotes: 0

Views: 481

Answers (1)

Parfait
Parfait

Reputation: 107567

Consider using cumcount() on a groupby() and then join on both that field with ID:

drugs['GrpCount'] = (drugs.groupby(['ID'])).cumcount()

lab['GrpCount'] = (lab.groupby(['ID'])).cumcount()

merged_data = pd.merge(drugs, lab, on=['ID', 'GrpCount'], how='left').drop(['GrpCount'], axis=1)

#     ID Initials_x                      Drug Name          Frequency          Route   Start Date     End Date Initials_y                    Name Result Date    Result
# 0    1         AB                       AMPICLOX                NaN           Oral  21-Jun-2016  21-Jun-2016         AB  Rapid Diagnostic Test    30-May-16  Abnormal
# 1    1         AB                  CIPROFLOXACIN              Daily           Oral  30-May-2016  03-Jun-2016         AB              Microscopy   30-May-16    Normal
# 2    1         AB        Ibuprofen Tablet 400 mg    Two Times a Day           Oral  06-Oct-2016  10-Oct-2016        NaN                     NaN         NaN       NaN
# 3    1         AB                        COARTEM                NaN           Oral  17-Jun-2016  17-Jun-2016        NaN                     NaN         NaN       NaN
# 4    1         AB          INJECTABLE ARTESUNATE          12 Hourly    Intravenous  01-Jun-2016  02-Jun-2016        NaN                     NaN         NaN       NaN
# 5    1         AB                  COTRIMOXAZOLE              Daily           Oral  30-May-2016  12-Jun-2016        NaN                     NaN         NaN       NaN
# 6    1         AB                  METRONIDAZOLE    Two Times a Day           Oral  30-May-2016  03-Jun-2016        NaN                     NaN         NaN       NaN
# 7    2         SS                     GENTAMICIN              Daily    Intravenous  04-Jun-2016  04-Jun-2016         SS              Microscopy    6-Jun-16  Abnormal
# 8    2         SS                  METRONIDAZOLE           8 Hourly    Intravenous  04-Jun-2016  06-Jun-2016         SS    Complete Blood Count    6-Oct-16  Recorded
# 9    2         SS  Oral Rehydration Salts Powder                PRN           Oral  06-Jun-2016  06-Jun-2016        NaN                     NaN         NaN       NaN
# 10   2         SS                           ZINC           8 Hourly           Oral  06-Jun-2016  06-Jun-2016        NaN                     NaN         NaN       NaN

Upvotes: 1

Related Questions