Reputation: 135
I have a dataset that reads something like this:
record_id <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3 )
voucher_number <- c("app1", "00000", "11111", "22222", "11111", "app2", "33333", "44444", "33333",
"33333", "app", "55555", "66666", "55555", "66666", "55555, "77777 )
ds <- data.frame(record_id, voucher_number, stringsAsFactors=FALSE)
record_id voucher_number
1 1
2 1 00000
3 1 11111
4 1 22222
5 1 11111
6 2
7 2 33333
8 2 44444
9 2 33333
10 2 33333
11 3
12 3 55555
13 3 66666
14 3 55555
15 3 66666
16 3 55555
17 3 77777
I want to write a function where after grouping by record_id
I am creating a new variables lets say called Ice
. I want the value of Ice
to be app
if voucher_number
is missing. Otherwise I want to index voucher_number
as 1 or 2 or 3 or so forth if voucher_number
were the same for individual record_id
and if its a new "voucher_number``` for the same record id and it was not repeated then I want it to be called as 1.
Something like the following:
record_id voucher_number ice
1 1 app1 app
2 1 00000 1
3 1 11111 1
4 1 22222 1
5 1 11111 2
6 2 app2 app
7 2 33333 1
8 2 44444 1
9 2 33333 2
10 2 33333 3
11 3 app3 app
12 3 55555 1
13 3 66666 1
14 3 55555 2
15 3 66666 2
16 3 55555 3
17 3 77777 1
and ultimately I want the dataset to be ordered by record_id
and voucher_number
.
Thanks so much!
Upvotes: 0
Views: 152
Reputation: 388982
We can create a row number for each value of record_id
and voucher_number
and replace
ice
value where voucher_number
has "app"
in it.
library(dplyr)
ds %>%
group_by(record_id, voucher_number) %>%
mutate(ice = row_number(),
ice = replace(ice, grep('app', voucher_number), 'app'))
# record_id voucher_number ice
# <dbl> <chr> <chr>
# 1 1 app1 app
# 2 1 00000 1
# 3 1 11111 1
# 4 1 22222 1
# 5 1 11111 2
# 6 2 app2 app
# 7 2 33333 1
# 8 2 44444 1
# 9 2 33333 2
#10 2 33333 3
#11 3 app app
#12 3 55555 1
#13 3 66666 1
#14 3 55555 2
#15 3 66666 2
#16 3 55555 3
#17 3 77777 1
Upvotes: 2