bzzbzzRzzbzz
bzzbzzRzzbzz

Reputation: 111

Changing values on data subset based on conditions R

I ran into an issue when I am trying to manually change some values

Here is my data set

dat <- read.table(text='
id  Item                                                 Category    Next_Category
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           2
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           1
 1  "CRANBERRY 10PKTS CARTON"                            1           1
 1  "CRANBERRY 10PKTS CARTON"                            1           2
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           NA
', header=TRUE)

You can see that row 3 and row 4 has Category of 1. The conditions would be that row 3 and row 4 has values that can be found in the previous row (row 2), and that they continue to the next row (row 5). If so, they actually belong to Category 2 instead of Category 1 (yeah I know it is strange, but this is a requirement to treat them as the same).

I have multiple ids. I would like to only identify this kind of subset of data to achieve the desired outcome.

I have experimented with the idea of taking lag values of the Category to create an identifier on every decrease in the number from the Category. Let's ignore the scenario where there is an increase in the number from the Category first.

Expected output would be:

id  Item                                                 Category    Next_Category
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           2
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           1
 1  "CRANBERRY 10PKTS CARTON"                            2           1
 1  "CRANBERRY 10PKTS CARTON"                            2           2
 1  "CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON"   2           NA

Many thanks in advance!

Upvotes: 1

Views: 56

Answers (1)

akrun
akrun

Reputation: 887068

We create a sequence column ('i1'), reshape the 'wide' to 'long' format by splitting the 'Item' column by ', ' using cSplit, get the first element of 'Category', 'Next_Category' while collapsing the 'Item' with paste and finally assign the 'i1' to NULL

dt1[, i1 := seq_len(.N)]
library(splitstackshape)
cSplit(dt1, "Item", ", ", "long")[, 
    Category := Category[1L], .(id, Item)
    ][, c(list(Item = paste(Item, collapse=", ")), 
   Category = Category[1L], Next_Category = Next_Category[1L]),.(id, i1)
     ][, i1 := NULL][]
#   id                                             Item Category Next_Category
#1:  1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2             2
#2:  1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2             1
#3:  1                          CRANBERRY 10PKTS CARTON        2             1
#4:  1                          CRANBERRY 10PKTS CARTON        2             2
#5:  1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2            NA

We can also use a similar approach with tidyverse

library(tidyverse)
rownames_to_column(dat, "i1") %>% 
        separate_rows(Item, sep= ", ") %>% 
        group_by(i1, id) %>%
        mutate(Item = paste(Item, collapse=", ")) %>% 
        group_by(Item, add=TRUE) %>% 
        summarise_at(vars(Category, Next_Category), first) %>% 
        ungroup() %>% 
        select(-i1)
# A tibble: 5 × 4
#       id                                             Item Category Next_Category
#    <int>                                            <chr>    <int>         <int>
#1     1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2             2
#2     1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2             1
#3     1                          CRANBERRY 10PKTS CARTON        1             1
#4     1                          CRANBERRY 10PKTS CARTON        1             2
#5     1 CRANBERRY 10PKTS CARTON, BLUEBERRY 20PKTS CARTON        2            NA

data

dt1 <- as.data.table(dat)

Upvotes: 3

Related Questions