Reputation: 167
I have a multi-indexed dataframe like below:
col1 col2 col3 col4
row1 0 A A b b
1 B B c c
row2 0 A B d d
1 B B e e
and would like to know the most efficient way of concatenating the information e.g. for row1+col1, row1+col2, etc. such that my result will be:
col1 col2 col3 col4
row1 AB AB bc bc
row2 AB BB de de
so far, the best / only way I can see to do this is :
dx = pd.concat(
[df[col].unstack().apply(lambda row: row.str.cat(sep=''),axis=1)
for col in df.columns],
axis=1,
)
dx.columns = df.columns
In practice, this particular dataframe is 1.5m rows by 1000 columns in size, so a more efficient way of iterating through it will be most welcome!
Upvotes: 2
Views: 38
Reputation: 32095
Strings are sum
compatible, so this will simply make it by grouping on the first level of the index:
df.groupby(level=0).apply(sum)
Out[37]:
col1 col2 col3 col4
row1 AB AB bc bc
row2 AB BB de de
Upvotes: 2