Andres Mora
Andres Mora

Reputation: 1106

How to extract specific rows from dataset based on word condition

I have this sample dataset

df=structure(list(V1 = c("", "", "", ""), V2 = c("Segunda", "VACUNA SinoVac", 
"Primera", "PARTICULAR"), V3 = c("Dosis por aplicar", "UNIDAD DE SERVICIOS DE", 
"Aplicada", ""), V4 = c(NA, NA, "16", "SALUD CALLE 153"), V5 = c(NA, 
NA, "7", NA), V6 = c(NA, NA, "2021 202105061K No registra", NA
), V7 = c(NA, NA, "6", NA), V8 = c(NA, NA, "8", NA), V9 = c(NA, 
NA, "2021", NA), V10 = c(NA, NA, "ADRIANA JAIME", NA), V11 = c(NA_character_, 
NA_character_, NA_character_, NA_character_), V12 = c(NA_character_, 
NA_character_, NA_character_, NA_character_)), row.names = 53:56, class = "data.frame")

I'm currently extracting the row (lets call it Row X) that contains the word "Aplicada"

df.out1 = df %>% filter_all(any_vars(. %in% c("Aplicada")))

But now I'm also requiring to extract the entire row before Row X so the desired result is:

structure(list(V1 = c("", ""), V2 = c("VACUNA SinoVac", "Primera"
), V3 = c("UNIDAD DE SERVICIOS DE", "Aplicada"), V4 = c(NA, "16"
), V5 = c(NA, "7"), V6 = c(NA, "2021 202105061K No registra"), 
    V7 = c(NA, "6"), V8 = c(NA, "8"), V9 = c(NA, "2021"), V10 = c(NA, 
    "ADRIANA JAIME"), V11 = c(NA_character_, NA_character_), 
    V12 = c(NA_character_, NA_character_)), row.names = 54:55, class = "data.frame")

Could you please advise?

Upvotes: 0

Views: 62

Answers (3)

rjen
rjen

Reputation: 1972

A tidyverse option.

library(dplyr)
library(stringr)

keep <- df %>%
  mutate(id = row_number()) %>%
  filter(if_any(everything(), ~ str_detect(., 'Aplicada'))) %>%
  pull(id)

df %>%
  slice((keep-1):keep)
  
#   V1             V2                     V3   V4   V5                          V6   V7   V8   V9
# 1    VACUNA SinoVac UNIDAD DE SERVICIOS DE <NA> <NA>                        <NA> <NA> <NA> <NA>
# 2           Primera               Aplicada   16    7 2021 202105061K No registra    6    8 2021
#             V10  V11  V12
# 1          <NA> <NA> <NA>
# 2 ADRIANA JAIME <NA> <NA>

Upvotes: 1

Szymon Fraś
Szymon Fraś

Reputation: 106

I wrote a code that should work as You want.

y <- nrow(df)

for(i in 1:nrow(df)) {
  y[i] <- any(df[i, ] %in% c("Aplicada"))
  if(i > 1 & y[i] == 1) {
    y[i - 1] <- 1
  }
}

df[as.logical(y), ]

I tried use apply function instead of a loop, but it didnt work correct.

Upvotes: 1

utubun
utubun

Reputation: 4520

Will fail if the match is found in first row:

dplyr::slice(
  dat, 
  sapply(which(rowSums(dat == 'Aplicada', TRUE) == 1), \(x) { (x - 1):x }) 
)

#   V1             V2                     V3   V4   V5  <truncated>
# 1    VACUNA SinoVac UNIDAD DE SERVICIOS DE <NA> <NA>  <truncated>
# 2           Primera               Aplicada   16 7 202 <truncated>

Upvotes: 1

Related Questions