Reputation: 384
I have made an example dataframe with 3 columns: Colour, Value and Size. Rows are filled with numeric values, text and empty cells.
Colour Value Size
Red 1 Big
Red nd Small
Blue nd Big
Blue NA Big
Blue 1 Medium
Yellow 1 NA
Yellow 2 Big
Green NA NA
Green nd Medium
The colour column has duplicated colours which can have different values in the Value column.
If a numeric value is found in the Value column, then I want to select that row together with the rows that have non numeric values of the same colour. So the new dataframe would look like this:
Colour Value Size
Red 1 Big
Red nd Small
Blue nd Big
Blue NA Big
Blue 1 Medium
Yellow 1 NA
Yellow 2 Big
So for example, if 1 row has a numeric value for the colour red, then I want to select every row for the red colour, numeric and non numeric.
If a colour has no numeric values at all, then it will be completely discarded, such as green in my example.
Any ideas how to accomplish this?
Upvotes: 4
Views: 1672
Reputation: 13319
Using dplyr
, we can filter
by group
. We match any Value
that contains a digit because R's interpretation of a vector of the form c(1,NA,"nd")
defaults to converting it to a character vector hence necessitating the use of some regular expressions to match any digit:
df %>%
group_by(Colour) %>%
filter(any(grepl("[0-9]",Value)))
# A tibble: 7 x 3
# Groups: Colour [3]
Colour Value Size
<chr> <chr> <chr>
1 Red 1 Big
2 Red nd Small
3 Blue nd Big
4 Blue NA Big
5 Blue 1 Medium
6 Yellow 1 NA
7 Yellow 2 Big
Data:
df <-structure(list(Colour = c("Red", "Red", "Blue", "Blue", "Blue",
"Yellow", "Yellow", "Green", "Green"), Value = c("1", "nd", "nd",
NA, "1", "1", "2", NA, "nd"), Size = c("Big", "Small", "Big",
"Big", "Medium", NA, "Big", NA, "Medium")), class = "data.frame", row.names = c(NA,
-9L))
Upvotes: 3
Reputation: 61214
In R base: you can use grep
to locate any digit ("\\d+"
), then use this location to obtain what color they belong to, afterwards,use this information to subset
the orginal data frame dat
> subset(dat, Colour %in% dat[grep("\\d+", dat$Value), "Colour"])
Colour Value Size
1 Red 1 Big
2 Red nd Small
3 Blue nd Big
4 Blue <NA> Big
5 Blue 1 Medium
6 Yellow 1 <NA>
7 Yellow 2 Big
Upvotes: 3