Bora
Bora

Reputation: 41

How to create a new column from common elements of other multiple columns?

I have been looking for a solution and trying out different approaches to solve the problem in the title but haven't succeeded unfortunately.

Here is an example of what I would like to do. Say you have a dataframe with four columns:

df = pd.DataFrame(
    {
        "c1": ["A", "B", "C", "D", "E", "F"],
        "c2": ["A", "B", "X", "D", "C", "E"],
        "c3": ["B", "C", "A", "X", "E", "G"],
        "c4": ["D", "E", "B", "C", "X", "A"],
    },
    index=[0, 1, 2, 3, 4, 5],
)

Now I would like to create a fifth column that will contain only the common elements of the other four:

df = pd.DataFrame(
{
        "c1": ["A", "B", "C", "D", "E", "F"],
        "c2": ["A", "B", "X", "D", "C", "E"],
        "c3": ["B", "C", "A", "X", "E", "G"],
        "c4": ["D", "E", "B", "C", "X", "A"],
        "c5": ["A", "B", "C", "E", NaN, NaN]
    },
    index=[0, 1, 2, 3, 4, 5],
)

In other words, if an element is in multiple columns from c1 to c4, include it in column c5.

This should be possible by using the merge function, but as I said haven't succeeded.

Upvotes: 0

Views: 261

Answers (1)

Nk03
Nk03

Reputation: 14949

You can use functools reduce:

from functools import reduce
import numpy as np

df['col5'] = pd.Series(reduce(lambda x,y: (np.intersect1d(x, y)), df.T.values))

RESULT:

  c1 c2 c3 c4 col5
0  A  A  B  D    A
1  B  B  C  E    B
2  C  X  A  B    C
3  D  D  X  C    E
4  E  C  E  X  NaN
5  F  E  G  A  NaN

Upvotes: 1

Related Questions