Reputation: 702
I have a dataframe like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":[10, 13, 15, 30],
"b:1":[np.nan, np.nan, 13, 14],
"b:2":[6, 7, np.nan, np.nan]})
I would like to combine columns when they start with "b:" into one column "b". I could simply use df["b"] = df["b:1"].combine_first(df["b:2"])
in this case, but this is an example of a larger dataframe and sometimes it can has also something like "b:3" and forward, or even other columns with "c:1, c:2" and these last ones I wouldn't like to merge.
Anyone could show me how I could do that so my final dataframe would be:
df
Out[23]:
a b:1 b:2 b
0 10 NaN 6.0 6.0
1 13 NaN 7.0 7.0
2 15 13.0 NaN 13.0
3 30 14.0 NaN 14.0
Upvotes: 1
Views: 822
Reputation: 25353
Another possible solution:
df['b'] = df.T[lambda x: x.index.str.startswith('b:')].ffill().bfill().iloc[0]
Output:
a b:1 b:2 b
0 10 NaN 6.0 6.0
1 13 NaN 7.0 7.0
2 15 13.0 NaN 13.0
3 30 14.0 NaN 14.0
Upvotes: 0
Reputation: 867
This might help you out:
from functools import reduce
import pandas as pd
import numpy as np
df = ... # define DataFrame
exclude_cols = ['c', 'd'] # List the columns that should be excluded from merging
included_cols = []
for col in df.columns:
if ':' in col:
base_col = col.split(':')[0]
if base_col in included_cols:
continue
associated_cols = [c for c in df.columns if f"{base_col}:" in col]
df[base_col] = reduce(lambda x, y: x.combine_first(y), [df[c] for c in associated_cols])
included_cols.append(base_col)
Upvotes: 0
Reputation: 3845
You can use str.contains
for df.columns
and then sum on axis=1
:
col_b = df.columns[df.columns.str.contains('b')]
df['b'] = df[col_b].sum(axis=1)
Upvotes: 3