Reputation: 41
I have been looking for a solution and trying out different approaches to solve the problem in the title but haven't succeeded unfortunately.
Here is an example of what I would like to do. Say you have a dataframe with four columns:
df = pd.DataFrame(
{
"c1": ["A", "B", "C", "D", "E", "F"],
"c2": ["A", "B", "X", "D", "C", "E"],
"c3": ["B", "C", "A", "X", "E", "G"],
"c4": ["D", "E", "B", "C", "X", "A"],
},
index=[0, 1, 2, 3, 4, 5],
)
Now I would like to create a fifth column that will contain only the common elements of the other four:
df = pd.DataFrame(
{
"c1": ["A", "B", "C", "D", "E", "F"],
"c2": ["A", "B", "X", "D", "C", "E"],
"c3": ["B", "C", "A", "X", "E", "G"],
"c4": ["D", "E", "B", "C", "X", "A"],
"c5": ["A", "B", "C", "E", NaN, NaN]
},
index=[0, 1, 2, 3, 4, 5],
)
In other words, if an element is in multiple columns from c1 to c4, include it in column c5.
This should be possible by using the merge function, but as I said haven't succeeded.
Upvotes: 0
Views: 261
Reputation: 14949
You can use functools reduce
:
from functools import reduce
import numpy as np
df['col5'] = pd.Series(reduce(lambda x,y: (np.intersect1d(x, y)), df.T.values))
RESULT:
c1 c2 c3 c4 col5
0 A A B D A
1 B B C E B
2 C X A B C
3 D D X C E
4 E C E X NaN
5 F E G A NaN
Upvotes: 1