Reputation: 901
I have a DataFrame as follows,
import pandas as pd
df = pd.DataFrame({'sent_a.1': [0, 3, 2, 1],
'sent_a.2': [0, 1, 4, 0],
'sent_b.3': [0, 6, 0, 8],
'sent_b.4': [1, 1, 8, 6]
})
I want to combine the columns that share a name. so the desired output would be,
sent_a sent_b
0 0,0 0,1
1 3,1 6,1
2 2,4 0,8
3 1,0 8,6
I have done the following and the code works, but since my original DataFrame has more than 200 columns, I am looking for a faster way of doing that.
df['sent_a'] = df['sent_a.1']+','+ df['sent_a.2']
df['sent_b'] = df['sent_b.1']+','+ df['sent_b.2']
df = df.drop(['sent_a.1', 'sent_a.2', 'sent_b.3', 'sent_b.4'], axis=1)
Upvotes: 1
Views: 319
Reputation:
You can groupby the first part of the columns names, with axis=1
(to group column-wise instead of row-wise), and for each group, use the built-in functools.reduce
to join all the columns together:
import functools as ft
new_df = df.groupby(df.columns.str.split('.').str[0], axis=1).apply(lambda g: ft.reduce(lambda x, y: x + ',' + y, [x[col].astype(str) for col in x]))
Output:
>>> new_df
sent_a sent_b
0 0,0 0,0
1 3,1 3,1
2 2,4 2,4
3 1,0 1,0
Upvotes: 2
Reputation: 262634
You can groupby
on the columns using the left part of the column names as group:
# get left part of name
group = df.columns.str.split('.').str[0]
# groupby and join as string
(df.astype(str).groupby(group, axis=1)
.apply(lambda d: d.apply(','.join, axis=1))
)
output:
sent_a sent_b
0 0,0 0,1
1 3,1 6,1
2 2,4 0,8
3 1,0 8,6
Upvotes: 3
Reputation:
You can make the columns MultiIndex and unstack
and groupby
the index and apply join
to get a Series similar to the desired outcome. swaplevel
+ unstack
will fetch the desired DataFrame.
df.columns = pd.MultiIndex.from_tuples(col.split('.') for col in df.columns)
out = df.unstack().astype(str).droplevel(1).groupby(level=[0,1]).agg(','.join).swaplevel().unstack()
Output:
sent_a sent_b
0 0,0 0,1
1 3,1 6,1
2 2,4 0,8
3 1,0 8,6
Upvotes: 3