Textime
Textime

Reputation: 99

Group by one variable, but summary() over all other variables (mean) in R

I know that there are already some threads about it, but I haven't found one yet about this specific problem. The dependent variable in my dataset is Y and I have 144 independent variables. Y and X can take only the values 1 or 0. The data looks like

          Y    A469 T593 K022K A835 Z935 U83F W5326  ...
 Person1  1      1    1    1     0    0    0    0
 Person2  1      0    1    0     1    1    0    0
 Person3  0      0    0    1     0    0    1    1
 ...
summary(dataset)

just provides descriptive statistics over all observations. What I want is (in pseudo-code):

summary(all variables if Y == 1 and Y == 0)

It would be great if I could see how often a certain X occurs in the certain value of Y. For example, mean(X4) = 0.04 and count = 6 if Y = 1.

Upvotes: 0

Views: 419

Answers (1)

Cettt
Cettt

Reputation: 11981

EDIT 2 after Akrun's and Gregor's comments here is the solution

 data_summary <- dataset %>% group_by(y) %>% 
    mutate(n = n()) %>%
    summarise_all(mean)

If you want to see more columns than fit on your screen you can try, e.g.,

  • print(data_summary, width = 20)
  • View(data_summary)
  • select(data_summary, <<particular columns you want to see>>)
  • ...

Upvotes: 2

Related Questions