james
james

Reputation: 23

merge dataframe columns with same name BUT without sorting the columns

I have a dataframe with the column names:

'01 2017' '02 2017' ...... '12 2017' '01 2018' '01 2018' ....

so you can see there is a repeat of 01 2018 and 01 2018 for two columns. I would like to sum the columns without rearranging the columns.

I have this code currently:

df.groupby(lambda x:x, axis=1).sum()

However, the ouput is:

'01 2017' '01 2018' ... '12 2017' '12 2018'

This sorts it by the first 2 digits and messes up the arrangement.

Upvotes: 2

Views: 50

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

I think you need reindex, using @piRSquared setup modified:

Use df.columns.unique true a list of unique column names in the current order, then use reindex with axis=1 to order the columns after the groupby with sum.

df = pd.DataFrame(
    np.random.randint(10, size=(6, 5)),
    columns='1 2017|2 2017|1 2018|2 2018|2 2018'.split('|')
)
df.groupby(df.columns, axis=1).sum().reindex(df.columns.unique(), axis=1)

Output:

   1 2017  2 2017  1 2018  2 2018
0       9       2       4      15
1       5       0       0      10
2       1       8       5      10
3       1       1       9       8
4       9       0       0       0
5       6       1       4       5

Upvotes: 2

BENY
BENY

Reputation: 323276

By using the data from Pir , slightly change it

Data input

np.random.seed([3, 1415])
df = pd.DataFrame(
    np.random.randint(10, size=(5, 4)),
    columns='1 2017|2 2017|1 2018|1 2018'.split('|')
)
df
Out[346]: 
   1 2017  2 2017  1 2018  1 2018
0       0       2       7       3
1       8       7       0       6
2       8       6       0       2
3       0       4       9       7
4       3       2       4       3

solution

df.sum(level=0,axis=1) # using `sum`
Out[347]: 
   1 2017  2 2017  1 2018
0       0       2      10
1       8       7       6
2       8       6       2
3       0       4      16
4       3       2       7

Upvotes: 2

Related Questions