Reputation: 31988
I have one large dataframe that acts as a map between integers and names:
from StringIO import StringIO
import pandas as pd
gene_int_map = pd.read_table(StringIO("""Gene Int
Mt-nd1 2
Cers2 4
Nampt 10
Madd 20
Zmiz1 21
Syt1 26
Syt5 30
Syt7 32
Cdca7 34
Ablim2 42
Elp5 43
Clic1 98
Ece2 100"""), sep="\s+")
Then I have another dataframe where I want to convert the Gene
column to the ints given in the map (the names in to_convert
can be overwritten):
to_convert = pd.read_table(StringIO("""Gene Term
Mt-nd1 GO:0005739
Mt-nd1 GO:0005743
Mt-nd1 GO:0016021
Mt-nd1 GO:0030425
Mt-nd1 GO:0043025
Mt-nd1 GO:0070469
Mt-nd1 GO:0005623
Mt-nd1 GO:0005622
Mt-nd1 GO:0005737
Madd GO:0016021
Madd GO:0045202
Madd GO:0005886
Zmiz1 GO:0005654
Zmiz1 GO:0043231
Cdca7 GO:0005622
Cdca7 GO:0005623
Cdca7 GO:0005737
Cdca7 GO:0005634
Cdca7 GO:0005654"""), sep="\s+")
Like I said, what I'd like to do is replace the names in to_convert
with the integer values from gene_int_map
.
I'm sure this is super-simple, but it seems no permutations of options for merge will do it. I could not get any boolean masks to work either.
Ps. I'd also like to replace the values in a one-column dataframe with the integers in gene_int_map
:
simple_series = pd.read_table(StringIO("""Gene
Ablim2
Elp5
Clic1
Ece2"""), squeeze=False)
It would be nice if the answer was general enough to include this case.
Upvotes: 3
Views: 2924
Reputation: 14841
Maybe you can create a dictionary like:
dictionary = dict(zip(gene_int_map.Gene, gene_int_map.Int))
And then replace values (using map
as suggested by @EdChum):
to_convert['Gene'].map(dictionary)
Having the dictionary created before will speed up the mapping.
Upvotes: 1
Reputation: 394269
Call set_index
on the 'Gene' column in gene_int_map
and pass this as the param to map
and call this on your 'Gene' column on your other df:
In [119]:
to_convert['Gene'].map(gene_int_map.set_index('Gene')['Int'])
Out[119]:
0 2
1 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 20
10 20
11 20
12 21
13 21
14 34
15 34
16 34
17 34
18 34
Name: Gene, dtype: int64
This also works for your simple_series
:
In [120]:
simple_series['Gene'].map(gene_int_map.set_index('Gene')['Int'])
Out[120]:
0 42
1 43
2 98
3 100
Name: Gene, dtype: int64
Upvotes: 2