Reputation: 173
I have a task that seems easy, but after working on it for a few hours I've decided that I'm stumped.
I have a dataframe:
mydata <- read.table(header=TRUE, text="
rime point sound
Y Y Y
N N Y
Y Y Y
NA NA NA
")
I would like my dataframe to look like this:
mydata <- read.table(header=TRUE, text="
standard Y N NA
rime 2 1 1
point 2 1 1
sound 3 0 1
")
My first thought was to use dplyr::count()
. I can get the correct numbers, but I have over 100 columns and don't want to call all of them by hand. Is there an R function that I could use to get the count I'm looking for?
Upvotes: 1
Views: 140
Reputation: 270338
Using only base R we convert to long form using stack
and then perform the frequency counting using xtabs
. To get it into the orientation shown in the question the resulting table is transposed.
t(xtabs(~., stack(mydata), addNA = TRUE))
## values
## ind N Y <NA>
## rime 1 2 1
## point 1 2 1
## sound 0 3 1
This variation also works and gives a similar result. (The data portions of both are the same but the class of the xtabs
solution is c("xtabs", "table")
and it has a call
attribute whereas the one below has the "table"
class.)
t(table(stack(mydata), useNA = "ifany"))
We can use tapply
giving a matrix output. We first change the NA's to ordinary levels since tapply
would remove those NAs.
s <- transform(stack(mydata), values = addNA(values))
t(tapply(rownames(s), s, length, default = 0))
Using tidyr we can convert to long form and back to wide form giving a tibble result:
library(tidyr)
mydata %>%
pivot_longer(everything()) %>%
pivot_wider(name, names_from = "value", values_fn = length, values_fill = 0)
## # A tibble: 3 x 4
## name Y N `NA`
## <chr> <int> <int> <int>
## 1 rime 2 1 1
## 2 point 2 1 1
## 3 sound 3 0 1
ctable
in the summarytools package has many arguments to customnize the output. Here is the default output.
library(summarytools)
with(stack(mydata), ctable(ind, values))
giving:
Cross-Tabulation, Row Proportions
ind * values
Data Frame: stack
------- -------- ----------- ----------- ----------- -------------
values N Y <NA> Total
ind
rime 1 (25.0%) 2 (50.0%) 1 (25.0%) 4 (100.0%)
point 1 (25.0%) 2 (50.0%) 1 (25.0%) 4 (100.0%)
sound 0 ( 0.0%) 3 (75.0%) 1 (25.0%) 4 (100.0%)
Total 2 (16.7%) 7 (58.3%) 3 (25.0%) 12 (100.0%)
------- -------- ----------- ----------- ----------- -------------
Have added additional approaches.
Upvotes: 4
Reputation: 39613
Try this. The key is to format you data to long, compute the desired summary (in your case you want the number of observations per groups). After that you can reshape again to wide in order to format the data as you want. Here the code using a tidyverse
approach with the data you shared mydata
:
library(tidyverse)
#Code
mydata %>%
pivot_longer(everything()) %>%
group_by(name,value) %>%
summarise(N=n()) %>% ungroup() %>%
group_by(name) %>% mutate(id=cur_group_id()) %>%
pivot_wider(names_from = value,values_from=N) %>%
select(-id) %>%
replace(is.na(.),0)
Output:
# A tibble: 3 x 4
# Groups: name [3]
name N Y `NA`
<chr> <int> <int> <int>
1 point 1 2 1
2 rime 1 2 1
3 sound 0 3 1
Upvotes: 0