Reputation: 1792
I have with some data with missing values (i.e. NA values), the simplified format is below (code for input at the end):
#> id x country
#> 1 1 2.0 USA
#> 2 2 4.0 USA
#> 3 3 3.5 JPN
#> 4 4 NA JPN
For each country, I'd like to take the mean of x
, and a count of usable values of x
(i.e. not NA), so I've used group_by
, and it works for the mean
:
df <- df %>% group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE),
#count_x = count(x))
)
df
#> # A tibble: 4 x 4
#> # Groups: country [2]
#> id x country mean_x
#> <dbl> <dbl> <fct> <dbl>
#> 1 1 2 USA 3
#> 2 2 4 USA 3
#> 3 3 3.5 JPN 3.5
#> 4 4 NA JPN 3.5
but when I try to add the count()
, I'm getting an error
library(tidyverse)
df <- data.frame(id = c(1, 2, 3, 4),
x = c(2, 4, 3.5, NA),
country = c("USA", "USA", "JPN", "JPN")
)
df
df <- df %>% group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE),
count_x = count(x))
)
df
#> Error in UseMethod("summarise_") : no applicable method for 'summarise_' applied to an
#> object of class "c('double', 'numeric')"
My desired output would be:
#> id x country mean_x count
#> <dbl> <dbl> <fct> <dbl>
#> 1 1 2 USA 3 2
#> 2 2 4 USA 3 2
#> 3 3 3.5 JPN 3.5 1
#> 4 4 NA JPN 3.5 1
Reproducible code below:
library(tidyverse)
df <- data.frame(id = c(1, 2, 3, 4),
x = c(2, 4, 3.5, NA),
country = c("USA", "USA", "JPN", "JPN")
)
df
df <- df %>% group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE),
count_x = count(x))
)
df
Upvotes: 2
Views: 2640
Reputation: 887118
We can also create the 'count' with a group_by
n()
library(dplyr)
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE)) %>%
summarise(n = n())
# A tibble: 2 x 2
# country n
# <fct> <int>
#1 JPN 2
#2 USA 2
Upvotes: 1
Reputation: 388982
count
is not the right function here. The first argument to count
is a dataframe or tibble specifically. However, what you are passing is a vector hence you get the error. Also count
summarises the dataframe so that you have only one row per group. See for example,
library(dplyr)
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE)) %>%
count(country)
# country n
# <fct> <int>
#1 JPN 2
#2 USA 2
If you want to add a new column without summarising, use add_count
instead
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE)) %>%
add_count(country)
# id x country mean_x n
# <dbl> <dbl> <fct> <dbl> <int>
#1 1 2 USA 3 2
#2 2 4 USA 3 2
#3 3 3.5 JPN 3.5 2
#4 4 NA JPN 3.5 2
However, both of this function don't do what you need. To count non-NA values per group, you need
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE),
count = length(na.omit(x)))
#OR
#count = sum(!is.na(x)))#as @Humpelstielzchen mentioned
# id x country mean_x count
# <dbl> <dbl> <fct> <dbl> <int>
#1 1 2 USA 3 2
#2 2 4 USA 3 2
#3 3 3.5 JPN 3.5 1
#4 4 NA JPN 3.5 1
Upvotes: 1