Reputation: 35
I have a data.frame
df where observations are cities and each column describes the amount of a certain pesticide used in that city (around 300 of them). The data.frame
looks like this:
head(df)
# A tibble: 6 x 11
Benzovindiflupir Beta_ciflutrina Beta_Cipermetrina Bicarbonato_de_potássio Bifentrina Bispiribaque_sódi~ Bixafem Boscalida
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.354 1.10 0.0219 0 4.99 0.0155 0.0686 0
2 11.3 2.43 0.130 0 8.84 0 2.08 22.2
3 0.329 0.267 0 0 1.86 0.00746 0.00583 0
4 4.56 2.71 0 0 7.39 0 1.04 0
5 0.925 0.512 0 0 0.612 0 0.119 0
6 0.388 0.149 0.000641 0 1.61 0.143 0.0777 0
I would like to remove all pesticides (columns) whose sum of that pesticide use in all cities is below a threshold. I called this threshold pestCutoff
.
pestCutoff <- 10000
So I want to create a new data.frame
called subDf with all rows and only the columns whose sum of all rows is >= cutoff
, but I haven`t found a way of filtering/subsetting/selecting columns based on a logical rule.
Upvotes: 2
Views: 1615
Reputation: 1
we may also use select_if
with sum
:
library(dplyr)
subDf <- df %>% select_if(~sum(.) >= pestCutoff)
Upvotes: 0
Reputation: 79204
We could also use which(colSums)
library(dplyr)
subDf <- df %>% select(which(colSums(.) >= pestCutoff))
data:
df <- tribble(
~Benzovindiflupir, ~Beta_ciflutrina, ~Beta_Cipermetrina, ~Bicarbonato_de_potássio, ~Bifentrina, ~Bispiribaque_sódi, ~Bixafem, ~Boscalida,
0.354, 1.10, 0.0219, 0, 4.99, 0.0155, 0.0686, 0,
11.3, 2.43, 0.130, 0, 8.84, 0, 2.08, 22.2,
0.329, 0.267, 0, 0, 1.86, 0.00746, 0.00583, 0,
4.56, 2.71, 0, 0, 7.39, 0, 1.04, 0,
0.925, 0.512, 0, 0, 0.612, 0, 0.119, 0,
0.388, 0.149, 0.000641, 0, 1.61, 0.143, 0.0777, 0)
Upvotes: 1
Reputation: 887851
We can use select
with a condition on the sum
i.e. if the sum
of that column greater than threshold, then select it
library(dplyr)
subDf <- df %>%
select(where( ~ sum(.) >= pestCutoff))
NOTE: Here we assume that the condition should be applied to all columns and all of them are numeric. If there are other columns types and do not want to select, create a short circuit
subDf <- df %>%
select(where(~ is.numeric(.) && sum(.) >= pestCutoff))
Or in base R
with Filter
Filter(function(x) sum(x) >= pestCutoff, df)
Or with colSums
df[colSums(df) >= pestCutoff]
Upvotes: 1