newbProgramR
newbProgramR

Reputation: 19

How to obtain data from columns

I am currently trying to obtain many data from columns in a data frame.

My data have this structure.

john   steven   mark
 1       2       4
 3       2       5
 4       5       NA 
 2       3       4 
 6       NA      1
 3       7       4

I need to obtain the mean, sd, how many responses they have, and in there are NA's.

So, I need a table like this

       john    steven   mark 
mean    3.4      4.0     4.5
sd      1.22     1.0     1.22
n        6        6       6
NA's     0        1       1

*The means and sd are not correct, it is just an illustration.

So how can I get these results with one simple code. I know that with tidyverse and if the data was transposed, I could easily group_by and then ask for this things, but when the names are in columns, I don't know how to do it.

Upvotes: 2

Views: 49

Answers (4)

Tech Commodities
Tech Commodities

Reputation: 1959

Using the tidyverse set of packages makes it straightforward to add different parameters:

library(tidyverse) # Load the libraries

df <- tribble(~john, ~steven, ~mark, # Build the tibble
              1,       2,       4,
              3,       2,       5,
              4,       5,       NA, 
              2,       3,       4,
              6,       NA,      1,
              3,       7,       4)

df %>% 
  pivot_longer(cols = everything()) %>% # Make the data long,...
  group_by(name) %>% # ...group by each name ...
  summarise(n = length(value), # ...extract the parameters
            sd = sd(value, na.rm = TRUE), 
            mean = mean(value, na.rm = TRUE), 
            NAs = sum(is.na(value)))

Gives:

# A tibble: 3 x 5
  name       n    sd  mean   NAs
  <chr>  <int> <dbl> <dbl> <int>
1 john       6  1.72  3.17     0
2 mark       6  1.52  3.6      1
3 steven     6  2.17  3.8      1

Upvotes: 0

user438383
user438383

Reputation: 6206

A data.table solution:

library(data.table)
transpose(
    melt(dat)[, 
        list(
            mean = mean(value, na.rm=T), 
            sd = sd(value, na.rm=T), 
            n = nrow(.SD), 
            `NA's` = sum(value %in% NA)), 
        by=variable
        ]
    )
1:             john           steven             mark
2: 3.16666666666667              3.8              3.6
3: 1.72240142436851 2.16794833886788 1.51657508881031
4:                6                6                6
5:                0                1                1

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101099

Try the code below

outer(
  c(
    mean = function(x) mean(x, na.rm = TRUE),
    sd = function(x) sd(x, na.rm = TRUE),
    n = length,
    NAs = function(x) sum(is.na(x))
  ),
  df,
  Vectorize(function(f, x) f(x))
)

which gives

         john   steven     mark
mean 3.166667 3.800000 3.600000
sd   1.722401 2.167948 1.516575
n    6.000000 6.000000 6.000000
NAs  0.000000 1.000000 1.000000

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388817

In base R, using sapply -

sapply(df, function(x) {
  c(mean = mean(x, na.rm = TRUE), 
    sd = sd(x, na.rm = TRUE), 
    n = length(x), 
    `NA's` = sum(is.na(x)))
})

#         john   steven     mark
#mean 3.166667 3.800000 3.600000
#sd   1.722401 2.167948 1.516575
#n    6.000000 6.000000 6.000000
#NA's 0.000000 1.000000 1.000000

Upvotes: 4

Related Questions