Kevin Connell
Kevin Connell

Reputation: 31

Which.max not working but which.min is (in R)

I have a table with a list of transactions across a range of different campaigns (or projects). What I'm trying to do is derive when each campaign was started based on the day of the first transaction in each. The data covers all transactions across all campaigns with only following two columns being relevant: Campaign.ID (numeric), Created.At (yyyy-mm-dd). This is what I have at the moment:

temp$Recency = as.integer(today() - temp$Created.At) # Get number of days ago
df = temp %>% # Multiple transactions per user - we only need the most recent
  group_by(Campaign.ID) %>% # Group transactions together by campaign
  slice(which.max(temp$Recency)) # Find first transaction - ie biggest Recency value
df = as.data.frame(df)

So I create the temp$Recency column to change the date column to an integer of days from today. I then group the data based on their campaign ID and try to select only the first transaction for each ie. the one with the largest value in temp$Recency

What's puzzling me is that this code works fine if I use which.min for finding the last transaction of each campaign and returns a data frame with a single (most recent) instance for each campaign. Yet when I try it with which.max it returns an empty data frame.

I don't understand the reason for this, surely if which.min works as expected then which.max should too. I've tried searching for an answer but haven't come across anyone with the same issue.


The data looks something like this:

Created.At Campaign.ID Recency
2018-06-21        1883      13
2018-06-21        1890      13
2018-06-20        1883      14

There are about ~3000 records across 50-60 different campaigns

which.min returns something like this:

Created.At Campaign.ID Recency
2017-07-02          19     367
2017-05-25          91     405
2017-06-06         344     393
2017-06-30         451     369
2017-06-30         509     369
2017-08-16         551     322

I need the same thing but for which.max

Upvotes: 1

Views: 2408

Answers (1)

Mikko Marttila
Mikko Marttila

Reputation: 11878

You should remove the temp$ from your slice() call.

As it is, which.max(temp$Recency) will return the index of the maximum value in the entire data (not one for each Campaign.ID), which then is out of bounds for any of the individual groups defined by Campaign.ID (I'm surprised this doesn't throw an error).

The expression in slice() is evaluated in the context of the data. However, as your data don't contain a column called temp, when the call is evaluated, temp$Recency gets found in the global environment instead. As a result, the code that you have is equivalent to:

max_row <- which.max(temp$Recency)
temp %>% 
  group_by(Campaign.ID) %>% 
  slice(max_row)

Then any groups with less than max_row observations just get left out.

However, if you write slice(which.max(Recency)), then Recency is found as a column in the, now grouped data, and is correctly evaluated in each group.

So, to get the result you are after, just write:

df %>%
  group_by(Campaign.ID) %>%
  slice(which.max(Recency))

as @LAP commented.

As to why which.min seemed to work, it's probably because the row with the global minimum is likely one of the first (maybe even the first?), so that most (if not all) of the groups have at least that many rows. I.e. if which.min(temp$Recency) == 1 you would always just get the first row in each subgroup.

Upvotes: 1

Related Questions