user026
user026

Reputation: 702

How to merge columns if column name starts with specific substring?

I have a dataframe like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({"a":[10, 13, 15, 30],
                  "b:1":[np.nan, np.nan, 13, 14],
                  "b:2":[6, 7, np.nan, np.nan]})

I would like to combine columns when they start with "b:" into one column "b". I could simply use df["b"] = df["b:1"].combine_first(df["b:2"]) in this case, but this is an example of a larger dataframe and sometimes it can has also something like "b:3" and forward, or even other columns with "c:1, c:2" and these last ones I wouldn't like to merge.

Anyone could show me how I could do that so my final dataframe would be:

df
Out[23]: 
    a   b:1  b:2     b
0  10   NaN  6.0   6.0
1  13   NaN  7.0   7.0
2  15  13.0  NaN  13.0
3  30  14.0  NaN  14.0

Upvotes: 1

Views: 822

Answers (3)

PaulS
PaulS

Reputation: 25353

Another possible solution:

df['b'] = df.T[lambda x: x.index.str.startswith('b:')].ffill().bfill().iloc[0]

Output:

    a   b:1  b:2     b
0  10   NaN  6.0   6.0
1  13   NaN  7.0   7.0
2  15  13.0  NaN  13.0
3  30  14.0  NaN  14.0

Upvotes: 0

frisko
frisko

Reputation: 867

This might help you out:

from functools import reduce

import pandas as pd
import numpy as np

df = ...  # define DataFrame

exclude_cols = ['c', 'd']  # List the columns that should be excluded from merging

included_cols = []
for col in df.columns:
    if ':' in col:
        base_col = col.split(':')[0]
        if base_col in included_cols:
            continue
        associated_cols = [c for c in df.columns if f"{base_col}:" in col]
        df[base_col] = reduce(lambda x, y: x.combine_first(y), [df[c] for c in associated_cols])
        included_cols.append(base_col)

Upvotes: 0

Nuri Taş
Nuri Taş

Reputation: 3845

You can use str.contains for df.columns and then sum on axis=1:

col_b = df.columns[df.columns.str.contains('b')]
df['b'] = df[col_b].sum(axis=1)

Upvotes: 3

Related Questions