Reputation: 14440
(note: the "sample data" section below is more succinct)
I have a pandas index:
Index(['RNF14', 'UBE2Q1', 'UBE2Q2', 'RNF10', 'RNF11', 'RNF13', 'REM1', 'REM2',
'C16orf13', 'MVB12B',
...
'MFAP1', 'CWC22', 'PLRG1', 'PRPF40A', 'SAP30BP', 'PIK3R1', 'MYPN',
'RBMX2', 'USP12', 'CHERP'],
dtype='object', length=854)
It represents a list of keys, and the indices of those keys in the Index
are what matter to me. (e.g. nodes.get_loc('PLRG1') # => 846
)
Now I also have a list of observations, each of which has an associated key (result of df.info()
below):
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58 entries, 0 to 57
Data columns (total 2 columns):
name 58 non-null object
prize 58 non-null float64
dtypes: float64(1), object(1)
The name
column is a column of names like those in my Index. I want to do a join, basically identical to a Dataframe merge, with my Dataframe and Index, such that each row in my Dataframe gets the appropriate numerical ID from my Index.
I can't use Dataframe.merge:
ValueError: can not merge DataFrame with instance of type <class 'pandas.indexes.base.Index'>
What should I do?
A larger question: what is the pandas Index
type for? I feel like I might be misusing it, despite the fact that, from an abstract standpoint, what I need here is clearly an "Index".
index = pd.Index(['RNF14', 'UBE2Q1', 'UBE2Q2', 'RNF10'])
# dataframe looks like:
name prize
0 RNF10 0.81
1 UBE2Q2 0.29
2 RNF14 2.68
# result I'm looking for:
name prize
3 RNF10 0.81
2 UBE2Q2 0.29
0 RNF14 2.68
Upvotes: 1
Views: 311
Reputation: 12923
You could use the DataFrame's set_index
method combined with the Index's get_indexer
method:
import pandas as pd
index = pd.Index(['RNF14', 'UBE2Q1', 'UBE2Q2', 'RNF10'])
df = pd.DataFrame([['RNF10', 0.81],['UBE2Q2',0.29],['RNF14',2.68]], columns=['name','prize'])
new_df = df.set_index(index.get_indexer(df['name']))
This will give
In [5]: df
Out[5]:
name prize
0 RNF10 0.81
1 UBE2Q2 0.29
2 RNF14 2.68
In [6]: new_df
Out[6]:
name prize
3 RNF10 0.81
2 UBE2Q2 0.29
0 RNF14 2.68
Upvotes: 1