R: Select duplicate rows of dataframe based on condition

Question

I have made an example dataframe with 3 columns: Colour, Value and Size. Rows are filled with numeric values, text and empty cells.

Colour  Value   Size
Red     1       Big
Red     nd      Small
Blue    nd      Big
Blue    NA      Big
Blue    1       Medium
Yellow  1       NA
Yellow  2       Big
Green   NA      NA
Green   nd      Medium

The colour column has duplicated colours which can have different values in the Value column.

If a numeric value is found in the Value column, then I want to select that row together with the rows that have non numeric values of the same colour. So the new dataframe would look like this:

Colour  Value   Size
Red     1       Big
Red     nd      Small
Blue    nd      Big
Blue    NA      Big
Blue    1       Medium
Yellow  1       NA
Yellow  2       Big

So for example, if 1 row has a numeric value for the colour red, then I want to select every row for the red colour, numeric and non numeric.

If a colour has no numeric values at all, then it will be completely discarded, such as green in my example.

Any ideas how to accomplish this?

NelsonGon · Accepted Answer

Using dplyr, we can filter by group. We match any Value that contains a digit because R's interpretation of a vector of the form c(1,NA,"nd") defaults to converting it to a character vector hence necessitating the use of some regular expressions to match any digit:

df %>% 
  group_by(Colour) %>% 
   filter(any(grepl("[0-9]",Value))) 
# A tibble: 7 x 3
# Groups:   Colour [3]
  Colour Value Size  
      
1 Red    1     Big   
2 Red    nd    Small 
3 Blue   nd    Big   
4 Blue   NA    Big   
5 Blue   1     Medium
6 Yellow 1     NA    
7 Yellow 2     Big

Data:

df <-structure(list(Colour = c("Red", "Red", "Blue", "Blue", "Blue", 
"Yellow", "Yellow", "Green", "Green"), Value = c("1", "nd", "nd", 
NA, "1", "1", "2", NA, "nd"), Size = c("Big", "Small", "Big", 
"Big", "Medium", NA, "Big", NA, "Medium")), class = "data.frame", row.names = c(NA, 
-9L))

R: Select duplicate rows of dataframe based on condition

Answers (2)

Related Questions