dom_2108
dom_2108

Reputation: 49

Python TypeError: reduction operation 'argmin' not allowed for this dtype

I've got a df with 10 columns which I'm running with streamlit to create a dashboard. One of the things I do with the df is use idxmin():

df.loc[df.groupby('ID').created_date.idxmin()]

In my original df there are multiple rows with each ID so I'm using idxmin() to only return one row for each ID where it is the oldest record. However I keep getting the error TypeError: reduction operation 'argmin' not allowed for this dtype.

I've read up on it and it seems converting the ID column to a numeric dtype should work since it's currently an object dtype. However, a lot of the IDs cannot be converted. For exmaple these are the first 5 IDs in my df

   ID
 0 5F8306CE-5331-449F-9035-87D0C370E3A9
 1 14720
 2 FFDE5CB4-5DFD-48B7-8682-959124A11990
 3 29927
 4 00055450
 

The IDs that have hyphens throw up the error ValueError: Unable to parse string...

I also cannot change these IDs to get rid of hyphens or anything as they relate to real data, how else could I return one row for each ID based on the oldest created_date while keeping all 10 columns in the df.

Upvotes: 0

Views: 1688

Answers (1)

Vishal
Vishal

Reputation: 580

You need to make sure that the created_date column is not an object. If so, convert it into a datetime format.

In order to recreate your issue, I used the following steps:

# list of dates
dts = ['2021-12-12', '2022-12-03', '2022-09-22', '2022-01-01', '2022-08-12']

# list of IDs (numeric and string)
ids = ['54-44-ff-12', 14729, 'FF-24-11-CD', 29927, '00055450']

# create a pandas dataframe with these values
df = pd.DataFrame(columns=['created_date', 'ID'])
df['created_date'] = dts
df['ID'] = ids

# check data types
print(df.dtypes)

>>> created_date    object
>>> ID              object
>>> dtype: object

# running `idxmax()` on this would throw an error
df.loc[df.groupby('ID').created_date.idxmin()]

>>> TypeError: reduction operation 'argmin' not allowed for this dtype

# let's change created_date to datetime 
df['created_date'] = pd.to_datetime(df['created_date'])

# now `idxmax()` runs without any issue
df.loc[df.groupby('ID').created_date.idxmin()]

>>>  created_date           ID
>>>  0   2021-12-12        29927
>>>  2   2022-09-22  FF-24-11-CD

Upvotes: 1

Related Questions