Reputation:
I have a dataframe that looks like this:
df = pd.DataFrame({'a':[1,0,1],'b':[0,1,0],'b1':[1,0,0],'c':[0,1,1]})
df.columns = ['a','b','b','c']
>>> df
a b b c
0 1 0 1 0
1 0 1 0 1
2 1 0 0 1
I want to merge those two different b
columns together, like this:
a b c
0 1 1 0
1 0 1 1
2 1 0 1
I understand that I could use |
(OR) in a bitwise context to combine them, e.g. with a
and c
:
>>> df['a'] | df['c']
0 1
1 1
2 1
dtype: int64
But I'm having trouble selecting the two individual b
columns, because of this:
>>> df['b']
b b
0 0 1
1 1 0
2 0 0
>>> df['b']['b']
b b
0 0 1
1 1 0
2 0 0
>>> df['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']
b b
0 0 1
1 1 0
2 0 0
Upvotes: 3
Views: 113
Reputation: 5433
Assuming that you have multiple groups of repeated columns, you can apply the same logic of not_speshal's solution to each group using DataFrame.groupby
.
# group the columns (axis=1) by their labels (level=0) and apply the logic to each group
df = df.groupby(level=0, axis=1).sum().clip(0, 1)
Upvotes: 1
Reputation: 71
Beside the answer suggested by not_speshal, you could also access the columns by index as follows:
df.iloc[:, 1] | df.iloc[:, 2]
Upvotes: 1
Reputation: 23146
Try with sum
and clip
:
df["b"] = df["b"].sum(axis=1).clip(0, 1)
#remove duplicate column
df = df.loc[:, ~df.columns.duplicated()]
Upvotes: 2