Reputation: 23
I have a dataframe with the column names:
'01 2017' '02 2017' ...... '12 2017' '01 2018' '01 2018' ....
so you can see there is a repeat of 01 2018 and 01 2018 for two columns. I would like to sum the columns without rearranging the columns.
I have this code currently:
df.groupby(lambda x:x, axis=1).sum()
However, the ouput is:
'01 2017' '01 2018' ... '12 2017' '12 2018'
This sorts it by the first 2 digits and messes up the arrangement.
Upvotes: 2
Views: 50
Reputation: 153460
I think you need reindex
, using @piRSquared setup modified:
Use df.columns.unique true a list of unique column names in the current order, then use reindex with axis=1 to order the columns after the groupby with sum.
df = pd.DataFrame(
np.random.randint(10, size=(6, 5)),
columns='1 2017|2 2017|1 2018|2 2018|2 2018'.split('|')
)
df.groupby(df.columns, axis=1).sum().reindex(df.columns.unique(), axis=1)
Output:
1 2017 2 2017 1 2018 2 2018
0 9 2 4 15
1 5 0 0 10
2 1 8 5 10
3 1 1 9 8
4 9 0 0 0
5 6 1 4 5
Upvotes: 2
Reputation: 323276
By using the data from Pir , slightly change it
Data input
np.random.seed([3, 1415])
df = pd.DataFrame(
np.random.randint(10, size=(5, 4)),
columns='1 2017|2 2017|1 2018|1 2018'.split('|')
)
df
Out[346]:
1 2017 2 2017 1 2018 1 2018
0 0 2 7 3
1 8 7 0 6
2 8 6 0 2
3 0 4 9 7
4 3 2 4 3
solution
df.sum(level=0,axis=1) # using `sum`
Out[347]:
1 2017 2 2017 1 2018
0 0 2 10
1 8 7 6
2 8 6 2
3 0 4 16
4 3 2 7
Upvotes: 2