Reputation: 81
i have a pandas dataframe like below table. for each SITEID in the first column, i've same value for other columns like Priority, Region and Vendor but not same in the History column.
SITEID Priority Region Vendor HISTORY
====== ========= ====== ======= =================================================================
E1149 P3 R10 NSN 09-09 : ZRBSCN8, LUE1149 : Connector Faulty : 00: 31
=====================================================================================================
E1149 P3 R10 NSN 09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabling Fault: 00: 16
=====================================================================================================
E1149 P3 R10 NSN 09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault : 01: 02
=====================================================================================================
E1150 P3 R10 E// 09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear : 01: 13
=====================================================================================================
E1150 P3 R10 E// 09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18
=====================================================================================================
at first i want to merge the first four columns (SITEID, Priority, Region and Vendor) per each siteID and then put all the relevant records in the History column against it like below:
SITEID Priority Region Vendor HISTORY
====== ========= ====== ======= =================================================================
E1149 P3 R10 NSN 09-09 : ZRBSCN8, LUE1149 : Connector Faulty : 00: 31
09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabling Fault:00: 16
09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault : 01: 02
=====================================================================================================
E1150 P3 R10 E// 09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear : 01: 13
09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18
=====================================================================================================
what is the most efficient way to do this in the excel output file using xlswriter etc? i tried many solutions like swaplevel
but no result.
Upvotes: 1
Views: 95
Reputation: 71707
You can try a simple groupby
and agg
using .join
with delimiter \n
:
cols = ['SITEID', 'Priority', 'Region', 'Vendor']
df_merged = df.groupby(cols, as_index=False).agg('\n'.join)
Then save this merged dataframe to excel as:
df_merged.to_excel('file.xlsx')
Result:
Upvotes: 1
Reputation: 16683
You can simply look for duplicate rows for the relevant columns with .loc
. There are two parts to loc: 1. Rows and 2. columns:
rows
, you can use .duplicated()
and specify the columns to look for duplicates by passing subset=([])
. This will return True
for duplicate rows.columns
, you can use just pass the columns for the values that you want to change to blanks with ,['SITEID','Priority','Region','Vendor']]
rows
and columns
to blank with == ''
df.loc[df.duplicated(subset=(['SITEID','Priority','Region','Vendor'])),['SITEID','Priority','Region','Vendor']] = ''
df
Out[1]:
SITEID Priority Region Vendor \
0 E1149 P3 R10 NSN
1
2
3 E1150 P3 R10 E//
4
HISTORY
0 09-09 : ZRBSCN8, LUE1149 : Connector Faulty : ...
1 09-08 : ZRBSCN8, LUE1149 (Fluctuation)BSS Cabl...
2 09-07 : ZRBSCN8, LUE1149 : BSS Cabling Fault :...
3 09-09 : BABSCE3, LUE1150 & LUT7695 : Unclear :...
4 09-08 : BABSCE3, E1150 & T7695 : Unclear : 00: 18
df.to_csv('test.csv', index=False)
Upvotes: 1