Reputation: 1987
This is what my dataframe looks like.
dt <- read.table(text='
Name ActivityType GrpID
John Sale 1
John Sale 2
John Webinar 3
Kyle Email 1
Kyle Seminar 2
Kyle Sale 3
Kyle Webinar 4
Kyle Sale 5
Tom Email 1
Tom Video 2
Tom Seminar 3
', header=T, row.names = NULL)
I want to do 3 things.
Return the remaining group where the first ActivityType is NOT "Sale" BUT there is an ActivityType = "Sale" in some later row(as described in 1 & 2) AND only show the results upto the first instance of ActivityType = Sale. So it should show
Name ActivityType GrpID
Kyle Email 1
Kyle Seminar 2
Kyle Sale 3
It does not have to be 3 steps as described. I just need the final output. I was thinking about using SD function in data.table but dont know how to add those conditions. I will sincerely appreciate your help.
Upvotes: 1
Views: 127
Reputation: 5532
Using dplyr
this will work on your example above
dt %>%
group_by(Name) %>%
filter( sum((GrpID == 1 & ActivityType=='Sale')) == 0 ) %>%
filter( sum(ActivityType=='Sale') > 0 ) %>%
filter( GrpID <= min(GrpID[ActivityType == 'Sale'])) %>%
ungroup
#Source: local data frame [3 x 3]
#
# Name ActivityType GrpID
# (fctr) (fctr) (int)
#1 Kyle Email 1
#2 Kyle Seminar 2
#3 Kyle Sale 3
There may be a cleaner way to do this though.
EDIT: I added the output and ungroup
to remove the grouping.
EDIT 2: Based on the suggestion from MichaelChirico
dt %>%
group_by(Name) %>%
filter( !any(ActivityType == 'Sale' & GrpID == 1) ) %>% # 1
filter( any(ActivityType == 'Sale') ) %>% # 2
filter( GrpID <= min(GrpID[ActivityType == 'Sale'])) %>% # 3
ungroup
The above solution uses any
instead of sum
(the %>%
is the pipe operator). That is not to say this can't be made more efficient though. I will gladly update this again if someone suggests a more efficient and / or more readable dplyr
solution.
EDIT 3
Below is an alternate solution for item 3 based on a comment/solution by @MichaelChirico. This combined all 3 conditions in one filter statement (not using the progressive filtering above).
dt %>%
group_by(Name) %>%
mutate(x = (ActivityType == 'Sale') ) %>%
filter( !x[1],
any(x),
row_number() <= which.max(x)) %>%
ungroup %>%
select(-x)
Upvotes: 3
Reputation: 898
# For those who prefer to roll their own
result.list <- by(dt, dt$Name, FUN = function(x) {
f <- match("Sale", x$ActivityType)
if(!is.na(f) & (f != 1) ) return(head(x, f))
})
result.df <- do.call(rbind, result.list)
Upvotes: 1
Reputation: 34703
In data.table
:
setDT(dt)
sl <- "Sale" #since we re-use it so much...
#1)
dt[ , if (!ActivityType[1L] == sl) .SD, by = Name]
#2)
dt[ , if (any(ActivityType == sl)) .SD, by = Name]
#3)
dt[ , {x <- ActivityType == sl; if(!x[1] & any(x)) .SD[1:which.max(x)]}, by = Name]
(note that the third case subsumes the first two, so I'm assuming you want three different outputs... otherwise just stick to the last)
Upvotes: 7