Reputation: 52358
I am attempting to group_by
, locate the first appearance of a row that meets a criterion, and create a new column with a value based that selected row from each group.
This is easy to demonstrate with an example. A new column should be generated by
We start with this data.frame
df <- data.frame(
transaction_id=as.integer(c(1,1,1,2,2,2,2,2,3,3,3)),
item=as.character(c("crisps", "magnum", "gum",
"jerky", "cheese", "snickers", "ben&jerry", "magnum",
"halo", "crisps", "mars")),
icecream_bool=as.integer(c(0,1,0,
0,0,0,1,1,
1,0,0)),
stringsAsFactors = F
)
# transaction_id item icecream_bool
# 1 1 crisps 0
# 2 1 magnum 1
# 3 1 gum 0
# 4 2 jerky 0
# 5 2 cheese 0
# 6 2 snickers 0
# 7 2 ben&jerry 1
# 8 2 magnum 1
# 9 3 halo 1
# 10 3 crisps 0
# 11 3 mars 0
Generate ice_cream column like so
transaction_id item icecream_bool ice_cream
1 1 crisps 0 magnum
2 1 magnum 1 magnum
3 1 gum 0 magnum
4 2 jerky 0 ben&jerry
5 2 cheese 0 ben&jerry
6 2 snickers 0 ben&jerry
7 2 ben&jerry 1 ben&jerry
8 2 magnum 1 ben&jerry
9 3 halo 1 halo
10 3 crisps 0 halo
11 3 mars 0 halo
Upvotes: 0
Views: 49
Reputation: 389055
Another way if you only have 1's and 0's in icecream_bool
is by using which.max
library(dplyr)
df %>%
group_by(transaction_id) %>%
mutate(ice_cream = item[which.max(icecream_bool)])
# transaction_id item icecream_bool ice-cream
# <int> <chr> <int> <chr>
# 1 1 crisps 0 magnum
# 2 1 magnum 1 magnum
# 3 1 gum 0 magnum
# 4 2 jerky 0 ben&jerry
# 5 2 cheese 0 ben&jerry
# 6 2 snickers 0 ben&jerry
# 7 2 ben&jerry 1 ben&jerry
# 8 2 magnum 1 ben&jerry
# 9 3 halo 1 halo
#10 3 crisps 0 halo
#11 3 mars 0 halo
This can also be written in data.table
library(data.table)
setDT(df)[, ice_cream := item[which.max(icecream_bool)], transaction_id]
If there are values which are greater than 1 and we need to compare with 1 we can use match
df %>%
group_by(transaction_id) %>%
mutate(ice_cream = item[match(1,icecream_bool)])
Upvotes: 2
Reputation: 34556
You can do:
library(dplyr)
df %>%
group_by(transaction_id) %>%
mutate(icecream = first(item[icecream_bool == 1]))
# A tibble: 11 x 4
# Groups: transaction_id [3]
transaction_id item icecream_bool ice
<int> <chr> <int> <chr>
1 1 crisps 0 magnum
2 1 magnum 1 magnum
3 1 gum 0 magnum
4 2 jerky 0 ben&jerry
5 2 cheese 0 ben&jerry
6 2 snickers 0 ben&jerry
7 2 ben&jerry 1 ben&jerry
8 2 magnum 1 ben&jerry
9 3 halo 1 halo
10 3 crisps 0 halo
11 3 mars 0 halo
Upvotes: 1
Reputation: 18868
One way using dplyr
library(dplyr)
df%>%
group_by(transaction_id, icecream_bool) %>%
mutate(mr = row_number()) %>%
group_by(transaction_id) %>%
mutate(ice_cream=item[mr==1 & icecream_bool==1]) %>%
select(-mr)
# A tibble: 11 x 4
# Groups: transaction_id [3]
transaction_id item icecream_bool ice_cream
<int> <chr> <int> <chr>
1 1 crisps 0 magnum
2 1 magnum 1 magnum
3 1 gum 0 magnum
4 2 jerky 0 ben&jerry
5 2 cheese 0 ben&jerry
6 2 snickers 0 ben&jerry
7 2 ben&jerry 1 ben&jerry
8 2 magnum 1 ben&jerry
9 3 halo 1 halo
10 3 crisps 0 halo
11 3 mars 0 halo
Upvotes: 1
Reputation: 52358
I'm not sure if it's efficient, but I found a way
df <- data.frame(
transaction_id=as.integer(c(1,1,1,2,2,2,2,2,3,3,3)),
item=as.character(c("crisps", "magnum", "gum",
"jerky", "cheese", "snickers", "ben&jerry", "magnum",
"halo", "crisps", "mars")),
icecream_bool=as.integer(c(0,1,0,
0,0,0,1,1,
1,0,0)),
stringsAsFactors = F
)
df %>%
group_by(transaction_id) %>%
filter(icecream_bool == 1) %>%
distinct(transaction_id, icecream_bool, .keep_all = T) %>%
select(-icecream_bool) %>%
rename(icecream=item) %>%
left_join(df, ., by=c("transaction_id"="transaction_id"))
transaction_id item icecream_bool icecream
1 1 crisps 0 magnum
2 1 magnum 1 magnum
3 1 gum 0 magnum
4 2 jerky 0 ben&jerry
5 2 cheese 0 ben&jerry
6 2 snickers 0 ben&jerry
7 2 ben&jerry 1 ben&jerry
8 2 magnum 1 ben&jerry
9 3 halo 1 halo
10 3 crisps 0 halo
11 3 mars 0 halo
Upvotes: 0