Reputation: 19
I am currently trying to obtain many data from columns in a data frame.
My data have this structure.
john steven mark
1 2 4
3 2 5
4 5 NA
2 3 4
6 NA 1
3 7 4
I need to obtain the mean, sd, how many responses they have, and in there are NA's.
So, I need a table like this
john steven mark
mean 3.4 4.0 4.5
sd 1.22 1.0 1.22
n 6 6 6
NA's 0 1 1
*The means and sd are not correct, it is just an illustration.
So how can I get these results with one simple code. I know that with tidyverse
and if the data was transposed, I could easily group_by and then ask for this things, but when the names are in columns, I don't know how to do it.
Upvotes: 2
Views: 49
Reputation: 1959
Using the tidyverse set of packages makes it straightforward to add different parameters:
library(tidyverse) # Load the libraries
df <- tribble(~john, ~steven, ~mark, # Build the tibble
1, 2, 4,
3, 2, 5,
4, 5, NA,
2, 3, 4,
6, NA, 1,
3, 7, 4)
df %>%
pivot_longer(cols = everything()) %>% # Make the data long,...
group_by(name) %>% # ...group by each name ...
summarise(n = length(value), # ...extract the parameters
sd = sd(value, na.rm = TRUE),
mean = mean(value, na.rm = TRUE),
NAs = sum(is.na(value)))
Gives:
# A tibble: 3 x 5
name n sd mean NAs
<chr> <int> <dbl> <dbl> <int>
1 john 6 1.72 3.17 0
2 mark 6 1.52 3.6 1
3 steven 6 2.17 3.8 1
Upvotes: 0
Reputation: 6206
A data.table
solution:
library(data.table)
transpose(
melt(dat)[,
list(
mean = mean(value, na.rm=T),
sd = sd(value, na.rm=T),
n = nrow(.SD),
`NA's` = sum(value %in% NA)),
by=variable
]
)
1: john steven mark
2: 3.16666666666667 3.8 3.6
3: 1.72240142436851 2.16794833886788 1.51657508881031
4: 6 6 6
5: 0 1 1
Upvotes: 1
Reputation: 101099
Try the code below
outer(
c(
mean = function(x) mean(x, na.rm = TRUE),
sd = function(x) sd(x, na.rm = TRUE),
n = length,
NAs = function(x) sum(is.na(x))
),
df,
Vectorize(function(f, x) f(x))
)
which gives
john steven mark
mean 3.166667 3.800000 3.600000
sd 1.722401 2.167948 1.516575
n 6.000000 6.000000 6.000000
NAs 0.000000 1.000000 1.000000
Upvotes: 2
Reputation: 388817
In base R, using sapply
-
sapply(df, function(x) {
c(mean = mean(x, na.rm = TRUE),
sd = sd(x, na.rm = TRUE),
n = length(x),
`NA's` = sum(is.na(x)))
})
# john steven mark
#mean 3.166667 3.800000 3.600000
#sd 1.722401 2.167948 1.516575
#n 6.000000 6.000000 6.000000
#NA's 0.000000 1.000000 1.000000
Upvotes: 4