Match a value in the column and return another column in pandas | python

Question

I have an input file1 of two columns(tab-separted):

c1	c2
aaa	232 65 19 32
bbew	32 22 20
jhsi	986 1 32 463 221

And input file2 which have one column:

what I want is to search an element from file2 in file1, and return the corresponding value in c1. If there is more than one matched value, then return all together in one column.

Here is what should the output file be like:

19       aaa
1        jhsi
32       aaa bbew jhsi
277

277 would be left empty because it does not exist.

Any suggestion will be helpful.

jpp · Accepted Answer

This isn't easily vectorisable. For performance, I suggest you perform your transformation before you put data in a Pandas dataframe. Here is a solution using collections.defaultdict:

# use set for O(1) lookup
scope_set = set(df2['c1'])

# initialise defualtdict of lists
dd = defaultdict(list)

# iterate and create dictionary mapping numbers to keys
for row in df1.itertuples(index=False):
    for num in map(int, row.c2.split()):
        if num in scope_set:
            dd[num].append(row.c1)

# construct dataframe from defaultdict
df = pd.DataFrame({'num': list(dd), 'keys': list(map(' '.join, dd.values()))})

# reindex to include blanks
df = df.set_index('num').reindex(sorted(scope_set)).reset_index()

Result

print(df)

   num           keys
0    1           jhsi
1   19            aaa
2   32  aaa bbew jhsi
3  277            NaN

Setup

from io import StringIO
from collections import defaultdict

file1 = StringIO("""c1	c2
aaa	232 65 19 32
bbew	32 22 20
jhsi	986 1 32 463 221""")

file2 = StringIO("""c1
19
1
32
277""")

df1 = pd.read_csv(file1, sep='	')
df2 = pd.read_csv(file2)

Match a value in the column and return another column in pandas | python

Answers (1)

Related Questions