Kak Schoen
Kak Schoen

Reputation: 384

R: Select duplicate rows of dataframe based on condition

I have made an example dataframe with 3 columns: Colour, Value and Size. Rows are filled with numeric values, text and empty cells.

Colour  Value   Size
Red     1       Big
Red     nd      Small
Blue    nd      Big
Blue    NA      Big
Blue    1       Medium
Yellow  1       NA
Yellow  2       Big
Green   NA      NA
Green   nd      Medium

The colour column has duplicated colours which can have different values in the Value column.

If a numeric value is found in the Value column, then I want to select that row together with the rows that have non numeric values of the same colour. So the new dataframe would look like this:

Colour  Value   Size
Red     1       Big
Red     nd      Small
Blue    nd      Big
Blue    NA      Big
Blue    1       Medium
Yellow  1       NA
Yellow  2       Big

So for example, if 1 row has a numeric value for the colour red, then I want to select every row for the red colour, numeric and non numeric.

If a colour has no numeric values at all, then it will be completely discarded, such as green in my example.

Any ideas how to accomplish this?

Upvotes: 4

Views: 1672

Answers (2)

NelsonGon
NelsonGon

Reputation: 13319

Using dplyr, we can filter by group. We match any Value that contains a digit because R's interpretation of a vector of the form c(1,NA,"nd") defaults to converting it to a character vector hence necessitating the use of some regular expressions to match any digit:

df %>% 
  group_by(Colour) %>% 
   filter(any(grepl("[0-9]",Value))) 
# A tibble: 7 x 3
# Groups:   Colour [3]
  Colour Value Size  
  <chr>  <chr> <chr> 
1 Red    1     Big   
2 Red    nd    Small 
3 Blue   nd    Big   
4 Blue   NA    Big   
5 Blue   1     Medium
6 Yellow 1     NA    
7 Yellow 2     Big  

Data:

df <-structure(list(Colour = c("Red", "Red", "Blue", "Blue", "Blue", 
"Yellow", "Yellow", "Green", "Green"), Value = c("1", "nd", "nd", 
NA, "1", "1", "2", NA, "nd"), Size = c("Big", "Small", "Big", 
"Big", "Medium", NA, "Big", NA, "Medium")), class = "data.frame", row.names = c(NA, 
-9L))

Upvotes: 3

Jilber Urbina
Jilber Urbina

Reputation: 61214

In R base: you can use grep to locate any digit ("\\d+"), then use this location to obtain what color they belong to, afterwards,use this information to subset the orginal data frame dat

> subset(dat, Colour %in% dat[grep("\\d+", dat$Value), "Colour"])
  Colour Value   Size
1    Red     1    Big
2    Red    nd  Small
3   Blue    nd    Big
4   Blue  <NA>    Big
5   Blue     1 Medium
6 Yellow     1   <NA>
7 Yellow     2    Big

Upvotes: 3

Related Questions