R - conditional labelling, but not the first one

Question

I have a dataset of the following structure (dummy data, but similar to what I have):


data <- data.frame(msg = c("this is sample 1", "another text", "cats are cute", "another text", "", "...", "another text", "missing example case", "cats are cute"), 
                   no = c(1, 15, 23, 9, 7, 5, 35, 67, 35), 
                   pat = c(0.11, 0.45, 0.3, 0.2, 0.6, 0.890, 0.66, 0.01, 0))

I'm interested in the column msg. I need to label each row with TRUE or FALSE in a new column (namely, usable). This labelling has to be done on conditions:

If the msg cell is empty (NA or empty string) => FALSE
If the msg cell only has symbols (no letters no numbers) => FALSE
If the msg was already there (assuming rows are in ascending order) => FALSE. Notice that the first entry will be TRUE, and the repeated will be FALSE. I don't care about the other columns (they are irrelevant on the comparison), but on my end result, I need to have all of the columns.

I did a very lengthy approach with a for, but I am looking at something shorter and better performing since the original dataset is long.

rjen · Accepted Answer

A tidyverse option. Note that map2_lgl is for convenience rather than speed.

library(dplyr)
library(purrr)
library(stringr)

data %>%
  mutate(id = row_number(),
         usable = map2_lgl(msg, id, 
                           ~ case_when(is.na(.x) | .x == '' ~ F,
                                       !str_detect(.x, '\w') ~ F,
                                       .x %in% msg[1:.y-1] ~ F,
                                        T ~ T))) %>%
  select(-id)

#                    msg no  pat usable
# 1     this is sample 1  1 0.11   TRUE
# 2         another text 15 0.45   TRUE
# 3        cats are cute 23 0.30   TRUE
# 4         another text  9 0.20  FALSE
# 5                       7 0.60  FALSE
# 6                  ...  5 0.89  FALSE
# 7         another text 35 0.66  FALSE
# 8 missing example case 67 0.01   TRUE
# 9        cats are cute 35 0.00  FALSE

R - conditional labelling, but not the first one

Answers (1)

Related Questions