Ben
Ben

Reputation: 560

Vectorized dataframe look-up

I have two dataframes:

df = pd.DataFrame([['A', 'B'], ['B', 'A']], columns=['Mon', 'Tues'])

    Mon Tues
0   A   B
1   B   A

lookup = pd.DataFrame([[0, 1], [2, 3]], index=['A', 'B'], columns=df.columns)

    Mon Tues
A   0   1
B   2   3

For each day, for each key in the first dataframe, I would like to look up its value in the second dataframe. This is one way to do it:

output = pd.DataFrame()
for col in df.columns:
    output[col] = df[col].map(lookup[col])

    Mon Tues
0   0   3
1   2   1

Is there a way to get the same answer without the explicit loop?

Upvotes: 2

Views: 191

Answers (2)

Alex Riley
Alex Riley

Reputation: 176938

You can use replace to exchange the values:

>>> df.replace(lookup)
   Mon  Tues
0    0     3
1    2     1

When passed to the replace method, DataFrames are treated similarly to nested dictionaries. For each column label, the column's values are mapped to the correspondingly indexed value in the lookup DataFrame.

Upvotes: 2

Divakar
Divakar

Reputation: 221624

I am assuming that each column data in df is a random order of the lookup.index and that lookup.index is in sorted order. So, going with it, here's a NumPy based vectorized solution, specifically using its advanced-indexing -

idx = np.argsort(df.values,1)
df_out = pd.DataFrame(lookup.values[idx,np.arange(idx.shape[1])],columns=df.columns)

Sample output -

In [41]: idx = np.argsort(df.values,1)

In [42]: pd.DataFrame(lookup.values[idx,np.arange(idx.shape[1])],columns=df.columns)
Out[42]: 
   Mon  Tues
0    0     3
1    2     1

If lookup.index is not in sorted order, we need to do some additional work -

sidx = lookup.index.argsort()
arr_out = lookup.values[idx,np.arange(idx.shape[1])][sidx]

Upvotes: 2

Related Questions