Reputation: 786
I have many columns that contain the same substring and i want to merge them to one using OR function.
So i want to merge all columns that have the same
dfin.columns.str.split("_").str[1]
first_RG7509| first_YY6124| last_YY6124| first_WE4818|first_AA7542| last_RG7509
1|0|1|1|0|0
and the output should be:
RG7509|YY6124|WE4818|AA7542
1|1|1|0
How can I achieve this?
Upvotes: 2
Views: 392
Reputation: 18916
You could do a duplicated check aswell:
df.columns = df.columns.str.split('_').str[1]
df = (df.T.sort_values(by=0)
.reset_index()
.drop_duplicates(subset='index', keep='last')
.set_index('index').T)
Full proof:
import pandas as pd
data = '''\
first_RG7509|first_YY6124|last_YY6124|first_WE4818|first_AA7542|last_RG7509
1|0|1|1|0|0'''
df = pd.read_csv(pd.compat.StringIO(data), sep='|')
df.columns = df.columns.str.split('_').str[1]
df = (df.T.sort_values(by=0)
.reset_index()
.drop_duplicates(subset='index', keep='last')
.set_index('index').T)
Upvotes: 0
Reputation: 375675
You can do a groupby with axis=1:
In [11]: df
Out[11]:
first_RG7509 first_YY6124 last_YY6124 first_WE4818 first_AA7542 last_RG7509
0 1 0 1 1 0 0
In [12]: df.groupby(lambda x: x.split("_")[1], axis=1).sum()
Out[12]:
AA7542 RG7509 WE4818 YY6124
0 0 1 1 1
Upvotes: 2
Reputation: 76346
You can take the transpose, groupby the second part of each string, then transpose back:
>>> df.T.groupby(df.T.index.str.split('_').str[1]).sum() > 0).T.astype(int)
AA7542 RG7509 WE4818 YY6124
0 0 1 1 1
Upvotes: 3