Replacing strings within a pandas DataFrame with a value which is currently an index.

Question

I have some output from an analysis (parsed into a pandas DataFrame) that needs some post-processing. Here is what the dataframe looks like:

                                    1         2              3         4    
index         GeneSymbol                                                     
11746909_a_at A1CF        11736238_a_at  0.038230    11724734_at  0.024966   
11736238_a_at ABCA5       11746909_a_at  0.038230    11724734_at  0.024771   
11724734_at   ABCB8       11746909_a_at  0.024966  11736238_a_at  0.024771   
11723976_at   ABCC8       11746909_a_at  0.017006  11736238_a_at  0.046125   
11718612_a_at ABCD4       11746909_a_at  0.014982  11736238_a_at  0.050172

Here we have a two way multi-index, the outer index are unique IDs and the inner index are symbols associated with the IDs. Then columns $1,...,n$ alternate between ID and numerical value (giving the strength of a correlation). Each ID in these columns are in the index. My question is: What would be the best strategy to replace the uninformative IDs with the appropiate symbol?

For example, the first row in the output table would look like this:

                                    1         2              3         4    
index         GeneSymbol                                                     
11746909_a_at A1CF        ABCA5          0.038230    ABCB8        0.024966   
11736238_a_at ABCA5       11746909_a_at  0.038230    11724734_at  0.024771   
11724734_at   ABCB8       11746909_a_at  0.024966  11736238_a_at  0.024771   
11723976_at   ABCC8       11746909_a_at  0.017006  11736238_a_at  0.046125   
11718612_a_at ABCD4       11746909_a_at  0.014982  11736238_a_at  0.050172

Thanks in advance

jezrael · Accepted Answer

You can use replace by Series created by reset_index:

df = df.replace(df.reset_index(level=1)['GeneSymbol'])
print (df)
                              1         2      3         4
index         GeneSymbol                                  
11746909_a_at A1CF        ABCA5  0.038230  ABCB8  0.024966
11736238_a_at ABCA5        A1CF  0.038230  ABCB8  0.024771
11724734_at   ABCB8        A1CF  0.024966  ABCA5  0.024771
11723976_at   ABCC8        A1CF  0.017006  ABCA5  0.046125
11718612_a_at ABCD4        A1CF  0.014982  ABCA5  0.050172

Another solution with dict created from list of tuples created by Index.values:

df = df = df.replace(dict(df.index.values))
print (df)
                              1         2      3         4
index         GeneSymbol                                  
11746909_a_at A1CF        ABCA5  0.038230  ABCB8  0.024966
11736238_a_at ABCA5        A1CF  0.038230  ABCB8  0.024771
11724734_at   ABCB8        A1CF  0.024966  ABCA5  0.024771
11723976_at   ABCC8        A1CF  0.017006  ABCA5  0.046125
11718612_a_at ABCD4        A1CF  0.014982  ABCA5  0.050172

Replacing strings within a pandas DataFrame with a value which is currently an index.

Answers (1)

Related Questions