user17242583
user17242583

Reputation:

Merging two identically-named columns in a dataframe

I have a dataframe that looks like this:

df = pd.DataFrame({'a':[1,0,1],'b':[0,1,0],'b1':[1,0,0],'c':[0,1,1]})
df.columns = ['a','b','b','c']

>>> df
   a  b  b  c
0  1  0  1  0
1  0  1  0  1
2  1  0  0  1

I want to merge those two different b columns together, like this:

   a  b  c
0  1  1  0
1  0  1  1
2  1  0  1

I understand that I could use | (OR) in a bitwise context to combine them, e.g. with a and c:

>>> df['a'] | df['c']
0    1
1    1
2    1
dtype: int64

But I'm having trouble selecting the two individual b columns, because of this:

>>> df['b']
   b  b
0  0  1
1  1  0
2  0  0

>>> df['b']['b']
   b  b
0  0  1
1  1  0
2  0  0

>>> df['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']
   b  b
0  0  1
1  1  0
2  0  0

Upvotes: 3

Views: 113

Answers (3)

Rodalm
Rodalm

Reputation: 5433

Assuming that you have multiple groups of repeated columns, you can apply the same logic of not_speshal's solution to each group using DataFrame.groupby.

# group the columns (axis=1) by their labels (level=0) and apply the logic to each group
df = df.groupby(level=0, axis=1).sum().clip(0, 1) 

Upvotes: 1

mailach
mailach

Reputation: 71

Beside the answer suggested by not_speshal, you could also access the columns by index as follows:

df.iloc[:, 1] | df.iloc[:, 2]

Upvotes: 1

not_speshal
not_speshal

Reputation: 23146

Try with sum and clip:

df["b"] = df["b"].sum(axis=1).clip(0, 1)

#remove duplicate column
df = df.loc[:, ~df.columns.duplicated()]

Upvotes: 2

Related Questions