Reputation: 11
Is there a way to remove specific values using R
programming?
For example, I have a variable named Survive
. This variable has input values such as "Y"
, "N"
, and "U"
. Is there a code to remove all the "U"
values?
I am fairly new to R
and tried this code:
project$Survive = "U"<-NULL
which obviously did not work.
Upvotes: 0
Views: 1053
Reputation: 2374
Here are two possible soutions. One filtering (dropping) observations where Survive
is "U" and the other just transforming to a missing value (NA
) using the package dplyr
. I would encourage looking at the package {dplyr}, which has a quite intuitive interface for data manipulation.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
project <- tribble(
~id, ~Survive, ~value,
1, "A", 42,
1, "U", 31,
2, "A", 21,
2, "U", 11
)
project %>%
# filter (keep) observations where survive is not equal to "U".
filter(Survive != "U")
#> # A tibble: 2 x 3
#> id Survive value
#> <dbl> <chr> <dbl>
#> 1 1 A 42
#> 2 2 A 21
project %>%
# mutate "U" into character missing (NA) when Survive == "U".
mutate(Survive = if_else(Survive == "U", NA_character_, Survive))
#> # A tibble: 4 x 3
#> id Survive value
#> <dbl> <chr> <dbl>
#> 1 1 A 42
#> 2 1 <NA> 31
#> 3 2 A 21
#> 4 2 <NA> 11
# base R approach for keeping a subset of observations.
(res_project <- subset(project, Survive != "U"))
#> # A tibble: 2 x 3
#> id Survive value
#> <dbl> <chr> <dbl>
#> 1 1 A 42
#> 2 2 A 21
Created on 2021-07-04 by the reprex package (v2.0.0)
Upvotes: 1