LamaMo
LamaMo

Reputation: 626

Match a value in the column and return another column in pandas | python

I have an input file1 of two columns(tab-separted):

c1\tc2
aaa\t232 65 19 32
bbew\t32 22 20
jhsi\t986 1 32 463 221

And input file2 which have one column:

c1
19
1
32
277

what I want is to search an element from file2 in file1, and return the corresponding value in c1. If there is more than one matched value, then return all together in one column.

Here is what should the output file be like:

19       aaa
1        jhsi
32       aaa bbew jhsi
277      

277 would be left empty because it does not exist.

Any suggestion will be helpful.

Upvotes: 1

Views: 2838

Answers (1)

jpp
jpp

Reputation: 164843

This isn't easily vectorisable. For performance, I suggest you perform your transformation before you put data in a Pandas dataframe. Here is a solution using collections.defaultdict:

# use set for O(1) lookup
scope_set = set(df2['c1'])

# initialise defualtdict of lists
dd = defaultdict(list)

# iterate and create dictionary mapping numbers to keys
for row in df1.itertuples(index=False):
    for num in map(int, row.c2.split()):
        if num in scope_set:
            dd[num].append(row.c1)

# construct dataframe from defaultdict
df = pd.DataFrame({'num': list(dd), 'keys': list(map(' '.join, dd.values()))})

# reindex to include blanks
df = df.set_index('num').reindex(sorted(scope_set)).reset_index()

Result

print(df)

   num           keys
0    1           jhsi
1   19            aaa
2   32  aaa bbew jhsi
3  277            NaN

Setup

from io import StringIO
from collections import defaultdict

file1 = StringIO("""c1\tc2
aaa\t232 65 19 32
bbew\t32 22 20
jhsi\t986 1 32 463 221""")

file2 = StringIO("""c1
19
1
32
277""")

df1 = pd.read_csv(file1, sep='\t')
df2 = pd.read_csv(file2)

Upvotes: 2

Related Questions