Subsetting a dataframe in R based on positions of elements in the columns

Question

I have a dataframe df which has 5 rows and 6 columns. df may or may not have elements in columns category1 to category5

Case 1: df having elements in category1 to category5

df <- data.frame(
  Hits = c("Hit1", "Hit2", "Hit3", "Hit4", "Hit5"),
  category1 = c("a1", "", "b1", "a1", "c1"),
  category2 = c("", "", "", "", "a2"),
  category3 = c("a3", "", "b3", "", "a3"),
  category4 = c("", "", "", "", ""),
  category5 = c("", "", "a5", "b5", ""),
  stringsAsFactors = FALSE)

Case 2: df having no elements in category1 to category5

For Case 1, from each of the columns category1 to category5, I need to retain only the elements which appear at the topmost position i.e.

and finally, drop the rows having no elements in these five columns, i.e.

For Case 2, I would like to retain only the topmost row i.e

How do I merge together the solutions to both the cases?

Ronak Shah · Accepted Answer

You can use a conditional statement to handle the two cases and include it in slice -

library(dplyr)

df %>%
  mutate(across(starts_with('category'), ~replace(., -match(TRUE, . != ''), ''))) %>%
  slice({
    tmp <- if_any(starts_with('category'), ~. != '')
    if(any(tmp)) which(tmp) else 1
  })

For case 1 this returns -

#  Hits category1 category2 category3 category4 category5
#1 Hit1        a1                  a3                    
#2 Hit3                                                a5
#3 Hit5                  a2

For case 2 -

#  Hits category1 category2 category3 category4 category5
#1 Hit1

Subsetting a dataframe in R based on positions of elements in the columns

Answers (1)

Related Questions