The Unfun Cat
The Unfun Cat

Reputation: 31988

How use dataframe as map to change values in another dataframe

I have one large dataframe that acts as a map between integers and names:

from StringIO import StringIO
import pandas as pd

gene_int_map = pd.read_table(StringIO("""Gene       Int
Mt-nd1   2
Cers2   4
Nampt   10
Madd    20
Zmiz1   21
Syt1        26
Syt5    30
Syt7        32
Cdca7   34
Ablim2  42
Elp5    43
Clic1   98
Ece2    100"""), sep="\s+")

Then I have another dataframe where I want to convert the Gene column to the ints given in the map (the names in to_convert can be overwritten):

to_convert = pd.read_table(StringIO("""Gene    Term
Mt-nd1  GO:0005739
Mt-nd1  GO:0005743
Mt-nd1  GO:0016021
Mt-nd1  GO:0030425
Mt-nd1  GO:0043025
Mt-nd1  GO:0070469
Mt-nd1  GO:0005623
Mt-nd1  GO:0005622
Mt-nd1  GO:0005737
Madd    GO:0016021
Madd    GO:0045202
Madd    GO:0005886
Zmiz1   GO:0005654
Zmiz1   GO:0043231
Cdca7   GO:0005622
Cdca7   GO:0005623
Cdca7   GO:0005737
Cdca7   GO:0005634
Cdca7   GO:0005654"""), sep="\s+")

Like I said, what I'd like to do is replace the names in to_convert with the integer values from gene_int_map.

I'm sure this is super-simple, but it seems no permutations of options for merge will do it. I could not get any boolean masks to work either.

Ps. I'd also like to replace the values in a one-column dataframe with the integers in gene_int_map:

simple_series = pd.read_table(StringIO("""Gene
Ablim2
Elp5
Clic1
Ece2"""), squeeze=False)

It would be nice if the answer was general enough to include this case.

Upvotes: 3

Views: 2924

Answers (2)

Peque
Peque

Reputation: 14841

Maybe you can create a dictionary like:

dictionary = dict(zip(gene_int_map.Gene, gene_int_map.Int))

And then replace values (using map as suggested by @EdChum):

to_convert['Gene'].map(dictionary)

Having the dictionary created before will speed up the mapping.

Upvotes: 1

EdChum
EdChum

Reputation: 394269

Call set_index on the 'Gene' column in gene_int_map and pass this as the param to map and call this on your 'Gene' column on your other df:

In [119]:
to_convert['Gene'].map(gene_int_map.set_index('Gene')['Int'])

Out[119]:
0      2
1      2
2      2
3      2
4      2
5      2
6      2
7      2
8      2
9     20
10    20
11    20
12    21
13    21
14    34
15    34
16    34
17    34
18    34
Name: Gene, dtype: int64

This also works for your simple_series:

In [120]:
simple_series['Gene'].map(gene_int_map.set_index('Gene')['Int'])

Out[120]:
0     42
1     43
2     98
3    100
Name: Gene, dtype: int64

Upvotes: 2

Related Questions