Mahmood
Mahmood

Reputation: 81

How to merge multiple columns with same content in the excel output file using pandas

i have a pandas dataframe like below table. for each SITEID in the first column, i've same value for other columns like Priority, Region and Vendor but not same in the History column.

SITEID  Priority    Region  Vendor                          HISTORY
======  =========   ======  ======= =================================================================
E1149       P3        R10     NSN       09-09 : ZRBSCN8, LUE1149 : Connector Faulty : 00: 31
=====================================================================================================
E1149       P3        R10     NSN       09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabling Fault: 00: 16
=====================================================================================================
E1149       P3        R10     NSN       09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault : 01: 02
=====================================================================================================
E1150       P3        R10     E//       09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear : 01: 13
=====================================================================================================
E1150       P3        R10     E//       09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18
=====================================================================================================

at first i want to merge the first four columns (SITEID, Priority, Region and Vendor) per each siteID and then put all the relevant records in the History column against it like below:

SITEID  Priority    Region  Vendor                          HISTORY
======  =========   ======  ======= =================================================================
E1149       P3        R10     NSN       09-09 : ZRBSCN8, LUE1149 : Connector Faulty : 00: 31
                                        09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabling Fault:00: 16
                                        09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault : 01: 02
=====================================================================================================
E1150       P3        R10     E//       09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear : 01: 13
                                        09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18
=====================================================================================================

what is the most efficient way to do this in the excel output file using xlswriter etc? i tried many solutions like swaplevel but no result.

Upvotes: 1

Views: 95

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71707

You can try a simple groupby and agg using .join with delimiter \n:

cols = ['SITEID', 'Priority', 'Region', 'Vendor']
df_merged = df.groupby(cols, as_index=False).agg('\n'.join)

Then save this merged dataframe to excel as:

df_merged.to_excel('file.xlsx')

Result:

enter image description here

Upvotes: 1

David Erickson
David Erickson

Reputation: 16683

You can simply look for duplicate rows for the relevant columns with .loc. There are two parts to loc: 1. Rows and 2. columns:

  1. For rows, you can use .duplicated() and specify the columns to look for duplicates by passing subset=([]). This will return True for duplicate rows.
  2. For columns, you can use just pass the columns for the values that you want to change to blanks with ,['SITEID','Priority','Region','Vendor']]
  3. Finally, set these specified rows and columns to blank with == ''

df.loc[df.duplicated(subset=(['SITEID','Priority','Region','Vendor'])),['SITEID','Priority','Region','Vendor']] = ''
df
Out[1]: 
  SITEID Priority Region Vendor  \
0  E1149       P3    R10    NSN   
1                                 
2                                 
3  E1150       P3    R10    E//   
4                                 

                                             HISTORY  
0  09-09 : ZRBSCN8, LUE1149 : Connector Faulty : ...  
1  09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabl...  
2  09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault :...  
3  09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear :...  
4  09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18

df.to_csv('test.csv', index=False)

enter image description here

Upvotes: 1

Related Questions