zara kolagar
zara kolagar

Reputation: 901

pandas: combine columns if they share a partly similar name

I have a DataFrame as follows,

import pandas as pd
df = pd.DataFrame({'sent_a.1': [0, 3, 2, 1],
               'sent_a.2': [0, 1, 4, 0],
               'sent_b.3': [0, 6, 0, 8],
               'sent_b.4': [1, 1, 8, 6]
               })

I want to combine the columns that share a name. so the desired output would be,

    sent_a  sent_b  
0     0,0    0,1
1     3,1    6,1
2     2,4    0,8 
3     1,0    8,6

I have done the following and the code works, but since my original DataFrame has more than 200 columns, I am looking for a faster way of doing that.

df['sent_a'] = df['sent_a.1']+','+ df['sent_a.2']
df['sent_b'] = df['sent_b.1']+','+ df['sent_b.2']
df = df.drop(['sent_a.1', 'sent_a.2', 'sent_b.3', 'sent_b.4'], axis=1)

Upvotes: 1

Views: 319

Answers (3)

user17242583
user17242583

Reputation:

You can groupby the first part of the columns names, with axis=1 (to group column-wise instead of row-wise), and for each group, use the built-in functools.reduce to join all the columns together:

import functools as ft
new_df = df.groupby(df.columns.str.split('.').str[0], axis=1).apply(lambda g: ft.reduce(lambda x, y: x + ',' + y, [x[col].astype(str) for col in x]))

Output:

>>> new_df
  sent_a sent_b
0    0,0    0,0
1    3,1    3,1
2    2,4    2,4
3    1,0    1,0

Upvotes: 2

mozway
mozway

Reputation: 262634

You can groupby on the columns using the left part of the column names as group:

# get left part of name
group = df.columns.str.split('.').str[0]

# groupby and join as string
(df.astype(str).groupby(group, axis=1)
   .apply(lambda d: d.apply(','.join, axis=1))
)

output:

  sent_a sent_b
0    0,0    0,1
1    3,1    6,1
2    2,4    0,8
3    1,0    8,6

Upvotes: 3

user7864386
user7864386

Reputation:

You can make the columns MultiIndex and unstack and groupby the index and apply join to get a Series similar to the desired outcome. swaplevel + unstack will fetch the desired DataFrame.

df.columns = pd.MultiIndex.from_tuples(col.split('.') for col in df.columns)
out = df.unstack().astype(str).droplevel(1).groupby(level=[0,1]).agg(','.join).swaplevel().unstack()

Output:

  sent_a sent_b
0    0,0    0,1
1    3,1    6,1
2    2,4    0,8
3    1,0    8,6

Upvotes: 3

Related Questions