Reputation: 1440
i have a dataframe like this :
ds y
1 2015-12-31 35.59050
2 2016-01-01 28.75111
3 2016-01-04 25.53158
4 2016-01-06 17.75369
5 2016-01-07 29.01500
6 2016-01-08 29.22663
7 2016-01-09 29.05249
8 2016-01-10 27.54387
9 2016-01-11 28.05674
10 2016-01-12 29.00901
11 2016-01-13 31.66441
12 2016-01-14 29.18520
13 2016-01-15 29.79364
14 2016-01-16 30.07852
i'm trying to create a loop that remove the rows which values in the 'ds'
column are above 34 or below 26, because there is where my outliers are:
for (i in grupo$y){if (i < 26) {grupo$y[i] = NA}}
i tried this to remove those below 26, i don't get any errors, but those rows won't go.
Any suggestions about how to remove those outliers??
Thanks in advance
Upvotes: 0
Views: 3475
Reputation: 16862
Here are a base R solution and a tidyverse
solution. Part of the strength of R is that for a problem such as this one, R's default of working across vectors means you often don't need a for loop. The issue is that in your loop, you're assigning values to NA
. That doesn't actually get rid of those values, it just gives them the value NA
.
In base R, you can use subset
to get the rows or columns of a data frame that meet certain criteria:
subset(grupo, y >= 26 & y <= 34)
#> # A tibble: 11 x 2
#> ds y
#> <date> <dbl>
#> 1 2016-01-01 28.8
#> 2 2016-01-07 29.0
#> 3 2016-01-08 29.2
#> 4 2016-01-09 29.1
#> 5 2016-01-10 27.5
#> 6 2016-01-11 28.1
#> 7 2016-01-12 29.0
#> 8 2016-01-13 31.7
#> 9 2016-01-14 29.2
#> 10 2016-01-15 29.8
#> 11 2016-01-16 30.1
Or using dplyr
functions, you can filter your data similarly, and make use of dplyr::between
. between(y, 26, 34)
is a shorthand for y >= 26 & y <= 34
.
library(dplyr)
grupo %>%
filter(between(y, 26, 34))
#> # A tibble: 11 x 2
#> ds y
#> <date> <dbl>
#> 1 2016-01-01 28.8
#> 2 2016-01-07 29.0
#> 3 2016-01-08 29.2
#> 4 2016-01-09 29.1
#> 5 2016-01-10 27.5
#> 6 2016-01-11 28.1
#> 7 2016-01-12 29.0
#> 8 2016-01-13 31.7
#> 9 2016-01-14 29.2
#> 10 2016-01-15 29.8
#> 11 2016-01-16 30.1
Upvotes: 3
Reputation: 6132
With dplyr you could do:
library(dplyr)
df %>%
filter(y >= 26 & y <= 34)
ds y
1 2016-01-01 28.75111
2 2016-01-07 29.01500
3 2016-01-08 29.22663
4 2016-01-09 29.05249
5 2016-01-10 27.54387
6 2016-01-11 28.05674
7 2016-01-12 29.00901
8 2016-01-13 31.66441
9 2016-01-14 29.18520
10 2016-01-15 29.79364
11 2016-01-16 30.07852
Upvotes: 2