Reputation: 111
I ran into an issue when I am trying to manually change some values
Here is my data set
dat <- read.table(text='
id Item Category Next_Category
1 "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON" 2 2
1 "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON" 2 1
1 "CRANBERRY 10PKTS CARTON" 1 1
1 "CRANBERRY 10PKTS CARTON" 1 2
1 "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON" 2 NA
', header=TRUE)
You can see that row 3 and row 4 has Category of 1. The conditions would be that row 3 and row 4 has values that can be found in the previous row (row 2), and that they continue to the next row (row 5). If so, they actually belong to Category 2 instead of Category 1 (yeah I know it is strange, but this is a requirement to treat them as the same).
I have multiple ids. I would like to only identify this kind of subset of data to achieve the desired outcome.
I have experimented with the idea of taking lag values of the Category to create an identifier on every decrease in the number from the Category. Let's ignore the scenario where there is an increase in the number from the Category first.
Expected output would be:
id Item Category Next_Category
1 "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON" 2 2
1 "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON" 2 1
1 "CRANBERRY 10PKTS CARTON" 2 1
1 "CRANBERRY 10PKTS CARTON" 2 2
1 "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON" 2 NA
Many thanks in advance!
Upvotes: 1
Views: 56
Reputation: 887068
We create a sequence column ('i1'), reshape the 'wide' to 'long' format by splitting the 'Item' column by ', '
using cSplit
, get the first element of 'Category', 'Next_Category' while collapsing the 'Item' with paste
and finally assign the 'i1' to NULL
dt1[, i1 := seq_len(.N)]
library(splitstackshape)
cSplit(dt1, "Item", ", ", "long")[,
Category := Category[1L], .(id, Item)
][, c(list(Item = paste(Item, collapse=", ")),
Category = Category[1L], Next_Category = Next_Category[1L]),.(id, i1)
][, i1 := NULL][]
# id Item Category Next_Category
#1: 1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON 2 2
#2: 1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON 2 1
#3: 1 CRANBERRY 10PKTS CARTON 2 1
#4: 1 CRANBERRY 10PKTS CARTON 2 2
#5: 1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON 2 NA
We can also use a similar approach with tidyverse
library(tidyverse)
rownames_to_column(dat, "i1") %>%
separate_rows(Item, sep= ", ") %>%
group_by(i1, id) %>%
mutate(Item = paste(Item, collapse=", ")) %>%
group_by(Item, add=TRUE) %>%
summarise_at(vars(Category, Next_Category), first) %>%
ungroup() %>%
select(-i1)
# A tibble: 5 × 4
# id Item Category Next_Category
# <int> <chr> <int> <int>
#1 1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON 2 2
#2 1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON 2 1
#3 1 CRANBERRY 10PKTS CARTON 1 1
#4 1 CRANBERRY 10PKTS CARTON 1 2
#5 1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON 2 NA
dt1 <- as.data.table(dat)
Upvotes: 3