Reputation: 47
I have a dataframe loaded from an Excel file. It is similar to this:
Gender Country Effect Use Products
Male UK 1 2 7
Female USA 2 4 6
Male Russia 3 5 2
Female China 4 2 3
Male China 3 1 6
Female USA 2 5 2
Male UK 3 3 1
Female Russia 4 1 7
I want to calculate the mean per Country, like in the example below (excluding Gender):
Country Effect Use Products
UK 3 2 7
USA 2 4 4
Russia 3 5 5
China 4 2 4
I used the similar code to perform this operation (where "d" is name of the database):
country_avg <- aggregate(d[, 3:5], list(d$`Country `), mean)
However, instead of the desired output, the outcoming database looks like this:
Group1 Effect Use Products
UK NA NA NA
USA NA NA NA
Russia NA NA NA
China NA NA NA
The numbers in my dataframe are not identified as numeric values (I've tested that using is.numeric). Moreover, R returns a lot of the following warning messages:
1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
Please let me know how could I possibly fix this problem.
Upvotes: 0
Views: 48
Reputation: 886938
Remove the backquote as it also includes a space as suffix which may not be there in the original data column name.
aggregate(d[, 3:5], list(d$Country ), FUN = mean)
Or use the formula method which gives the names as in the original data for the grouping column
aggregate(.~ Country, d[-1], FUN = mean)
d <- structure(list(Gender = c("Male", "Female", "Male", "Female",
"Male", "Female", "Male", "Female"), Country = c("UK", "USA",
"Russia", "China", "China", "USA", "UK", "Russia"), Effect = c(1L,
2L, 3L, 4L, 3L, 2L, 3L, 4L), Use = c(2L, 4L, 5L, 2L, 1L, 5L,
3L, 1L), Products = c(7L, 6L, 2L, 3L, 6L, 2L, 1L, 7L)),
class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 2