Alex Lenail
Alex Lenail

Reputation: 14440

How to merge / join with pandas Index type

Problem Statement

(note: the "sample data" section below is more succinct)

I have a pandas index:

Index(['RNF14', 'UBE2Q1', 'UBE2Q2', 'RNF10', 'RNF11', 'RNF13', 'REM1', 'REM2',
       'C16orf13', 'MVB12B',
       ...
       'MFAP1', 'CWC22', 'PLRG1', 'PRPF40A', 'SAP30BP', 'PIK3R1', 'MYPN',
       'RBMX2', 'USP12', 'CHERP'],
      dtype='object', length=854)

It represents a list of keys, and the indices of those keys in the Index are what matter to me. (e.g. nodes.get_loc('PLRG1') # => 846)

Now I also have a list of observations, each of which has an associated key (result of df.info() below):

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58 entries, 0 to 57
Data columns (total 2 columns):
name     58 non-null object
prize    58 non-null float64
dtypes: float64(1), object(1)

The name column is a column of names like those in my Index. I want to do a join, basically identical to a Dataframe merge, with my Dataframe and Index, such that each row in my Dataframe gets the appropriate numerical ID from my Index.

I can't use Dataframe.merge:

ValueError: can not merge DataFrame with instance of type <class 'pandas.indexes.base.Index'>

What should I do?

A larger question: what is the pandas Index type for? I feel like I might be misusing it, despite the fact that, from an abstract standpoint, what I need here is clearly an "Index".

Some sample data:

index = pd.Index(['RNF14', 'UBE2Q1', 'UBE2Q2', 'RNF10'])

# dataframe looks like: 
    name    prize
0   RNF10   0.81
1   UBE2Q2  0.29
2   RNF14   2.68

# result I'm looking for: 
    name    prize
3   RNF10   0.81
2   UBE2Q2  0.29
0   RNF14   2.68

Upvotes: 1

Views: 311

Answers (1)

sodd
sodd

Reputation: 12923

You could use the DataFrame's set_index method combined with the Index's get_indexer method:

import pandas as pd

index  = pd.Index(['RNF14', 'UBE2Q1', 'UBE2Q2', 'RNF10']) 
df     = pd.DataFrame([['RNF10', 0.81],['UBE2Q2',0.29],['RNF14',2.68]], columns=['name','prize'])
new_df = df.set_index(index.get_indexer(df['name']))

This will give

In [5]: df
Out[5]: 
     name  prize
0   RNF10   0.81
1  UBE2Q2   0.29
2   RNF14   2.68

In [6]: new_df
Out[6]:
     name  prize
3   RNF10   0.81
2  UBE2Q2   0.29
0   RNF14   2.68

Upvotes: 1

Related Questions