Reputation: 93
Issue 1 - solved by using pd.to_datetime(df.Date, format='%Y-%m-%d'). Thanks to Michael
I am trying to find the latest date of each user using their ID
df['Latest Date'] = df.groupby(['ID'])['Date'].transform.('max')
df.drop_duplicates(subset='ID', keep='last',inplace=True)
But I am getting '>=' not supported between instances of 'str' and 'float'
I have used the same approach in the past and it worked fine.
When I did dytypes, I see 'ID' column is int64 and Date column as object because I converted the date column to df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%m-%d')
Issue 2 solved - See Michael's comment 'For the edit'
But the output does not look right
I am trying to find the latest date of each user using their ID and assign those dates to new columns using the category
Dataframe = df
Data looks like below,
ID CATEGORY NAME DATE
1 fruits 2017-08-07 00:00:00
2 veggies 2018-01-25 00:00:00
1 fruits 2015-08-07 00:00:00
2 veggies 2022-01-01 00:00:00
My code is below
//Converting the date format
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
//transforming to identify the latest date
df['Latest Date'] = df.groupby(['ID'])['Date'].transform.('max')
//keeping the last and dropping the duplicates
df.drop_duplicates(subset='ID', keep='last',inplace=True)
//inserting new columns
df['Fruits'] = ' '
df['Veggies'] = ' '
//applying the latest dates to the newly created columns
df.loc[((df['CATEGORY NAME'] == 'fruits')), 'Fruits'] = df['Latest Date']
df.loc[((df['CATEGORY NAME'] == 'veggies')), 'Veggies'] = df['Latest Date']
I want the output like below
ID CATEGORY NAME DATE Latest Date Fruits Veggies
1 fruits 2017-08-07 2017-08-07 2017-08-07
2 veggies 2022-01-01 2022-01-01 2022-01-01
But my output looks odd. I don't have an error message but the output is not right
ID CATEGORY NAME DATE Latest Date Fruits Veggies
1 fruits 2017-08-07
2 veggies 2022-01-01 2021-01-01 2021-01-01 00:00:00
If you notice above
Not sure what's wrong
Upvotes: 0
Views: 2253
Reputation: 3128
strftime
converts a date to string. Did you want to keep it as a datetime object but change the format? Try this instead:
df.Date = pd.to_datetime(df.Date, format='%Y-%m-%d')
I'm not sure why you want the "Date" and "Latest Date" columns to be the same, but here is the code that will give you your desired table output:
# Recreate dataframe
ID = [1,2,1,2]
CATEGORY_NAME = ["fruits", "veggies", "fruits", "veggies"]
DATE = ["2017-08-07 00:00:00", "2018-01-25 00:00:00", "2015-08-07 00:00:00", "2022-01-01 00:00:00"]
df = pd.DataFrame({"ID":ID,"CATEGORY NAME":CATEGORY_NAME, "Date":DATE})
# Convert datetime format
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
# Get the max date value and assign the group to a new dataframe
dfNew = df.groupby(['ID'], as_index=False).max()
# The new dataframes Date and Latest Date column are the same
dfNew['Latest Date'] = dfNew['Date']
# Fix latest Date formatting
dfNew["Latest Date"] = dfNew["Latest Date"].dt.date
# Add fruit and veggie columns
dfNew['Fruits'] = ' '
dfNew['Veggies'] = ' '
# Place in the desired values
dfNew.loc[((dfNew['CATEGORY NAME'] == 'fruits')), 'Fruits'] = dfNew['Latest Date']
dfNew.loc[((dfNew['CATEGORY NAME'] == 'veggies')), 'Veggies'] = dfNew['Latest Date']
dfNew
Output:
ID CATEGORY NAME Date Latest Date Fruits Veggies
0 1 fruits 2017-08-07 2017-08-07 2017-08-07
1 2 veggies 2022-01-01 2022-01-01 2022-01-01
Upvotes: 2