David
David

Reputation: 500

DataFrame.lookup requires unique index and columns with a recent version of Pandas

I am working with python3.7, and I am facing an issue with a recent version of pandas. Here is my code.

import pandas as pd
import numpy as np

data = {'col_1':[9087.6000, 9135.8000, np.nan, 9102.1000],
        'col_2':[0.1648, 0.1649, '', 5.3379],
        'col_nan':[np.nan, np.nan, np.nan, np.nan],
        'col_name':['col_nan', 'col_1', 'col_2', 'col_nan']
        }
df = pd.DataFrame(data, index=[101, 102, 102, 104])

col_lookup = 'results'
col_result = 'col_name'
df[col_lookup] = df.lookup(df.index, df[col_result])

The code works fine with pandas version 1.0.3, but when I try with version 1.1.1 the following error occurs:

"ValueError: DataFrame.lookup requires unique index and columns"

The dataframe indeed includes a duplication of the index "102".

For different reasons, I have to work with version 1.1.1 of pandas. Is there a solution with the "lookup" command to support index duplication with this version of pandas?

Thanks in advance for your help.

Upvotes: 1

Views: 811

Answers (2)

Sajad.sni
Sajad.sni

Reputation: 318

Non-unique index was a bug: Github link

"look up" method in pandas 1.1.1 does not allows you to pass non-unique index as input argument. following code has been added at the beginning of "lookup" method in "frame.py" which for me is in(line 3836):

C:\Users\Sajad\AppData\Local\Programs\Python\Python38\Lib\site-packages\pandas\core\frame.py

if not (self.index.is_unique and self.columns.is_unique):
# GH#33041
    raise ValueError("DataFrame.lookup requires unique index and columns")

However if this error handler didn't exist, the following procedure in this method would end up in a for loop. substituting the last line with this built-in for loop gives you the same result as previous pandas versions.

result = np.empty(len(df.index), dtype="O")
for i, (r, c) in enumerate(zip(df.index, df[col_result])):
    result[i] = df._get_value(r, c)
df[col_lookup] = result

Upvotes: 2

Rob Raymond
Rob Raymond

Reputation: 31156

Put a unique index in place then restore the old index...

import pandas as pd
import numpy as np

data = {'col_1':[9087.6000, 9135.8000, np.nan, 9102.1000],
        'col_2':[0.1648, 0.1649, '', 5.3379],
        'col_nan':[np.nan, np.nan, np.nan, np.nan],
        'col_name':['col_nan', 'col_1', 'col_2', 'col_nan']
        }
df = pd.DataFrame(data, index=[101, 102, 102, 104])

col_lookup = 'results'
col_result = 'col_name'
df.reset_index(inplace=True)
df[col_lookup] = df.lookup(df.index, df[col_result])
df = df.set_index(df["index"]).drop(columns="index")

Upvotes: 2

Related Questions