Consolidating columns by the number before the decimal point in the column name

Question

I have the following dataframe (three example columns below):

import pandas as pd
array = {'25.2': [False, True, False], '25.4': [False, False, True], '27.78': [True, False, True]}
df = pd.DataFrame(array)


    25.2    25.4    27.78
0   False   False   True
1   True    False   False
2   False   True    True

I want to create a new dataframe with consolidated columns names, i.e. add 25.2 and 25.4 into 25 new column. If one of the values in the separate columns is True then the value in the new column is True.

Expected output:

      25     27
0   False   True
1   True    False
2   True    True

Any ideas?

Anurag Dabas · Accepted Answer

use rename()+groupby()+sum():

df=(df.rename(columns=lambda x:x.split('.')[0])
      .groupby(axis=1,level=0).sum().astype(bool))

OR

In 2 steps:

df.columns=[x.split('.')[0] for x in df]
#OR
#df.columns=df.columns.str.replace(r'\.\d+','',regex=True)
df=df.groupby(axis=1,level=0).sum().astype(bool)

output:

    25      27
0   False   True
1   True    False
2   True    True

Note: If you have int columns then you can use round() instead of split()

Consolidating columns by the number before the decimal point in the column name

Answers (2)

Related Questions