gtomer
gtomer

Reputation: 6574

Consolidating columns by the number before the decimal point in the column name

I have the following dataframe (three example columns below):

import pandas as pd
array = {'25.2': [False, True, False], '25.4': [False, False, True], '27.78': [True, False, True]}
df = pd.DataFrame(array)


    25.2    25.4    27.78
0   False   False   True
1   True    False   False
2   False   True    True

I want to create a new dataframe with consolidated columns names, i.e. add 25.2 and 25.4 into 25 new column. If one of the values in the separate columns is True then the value in the new column is True.

Expected output:

      25     27
0   False   True
1   True    False
2   True    True

Any ideas?

Upvotes: 0

Views: 52

Answers (2)

Anurag Dabas
Anurag Dabas

Reputation: 24324

use rename()+groupby()+sum():

df=(df.rename(columns=lambda x:x.split('.')[0])
      .groupby(axis=1,level=0).sum().astype(bool))

OR

In 2 steps:

df.columns=[x.split('.')[0] for x in df]
#OR
#df.columns=df.columns.str.replace(r'\.\d+','',regex=True)
df=df.groupby(axis=1,level=0).sum().astype(bool)

output:

    25      27
0   False   True
1   True    False
2   True    True

Note: If you have int columns then you can use round() instead of split()

Upvotes: 2

Corralien
Corralien

Reputation: 120499

Another way:

>>> df.T.groupby(np.floor(df.columns.astype(float))).sum().astype(bool).T

    25.0   27.0
0  False   True
1   True  False
2   True   True

Upvotes: 1

Related Questions