Reputation: 371
I have a data.frame that looks like this:
# A tibble: 2,003 x 16
barcost barrulesplay barrulessch barrulesrelax barrulesinjury barriskskills barraincold barrainsick barrainmessy barraininjury barrainparentdis… barrainchilddis… barrainchildclo…
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 3 4 3 4 4 4 NA NA NA NA NA NA NA
2 2 5 5 5 3 5 NA NA NA NA NA NA NA
3 2 2 2 3 2 4 NA NA NA NA NA NA NA
4 2 4 4 4 2 4 NA NA NA NA NA NA NA
5 2 3 3 4 2 4 NA NA NA NA NA NA NA
6 2 4 4 4 3 4 NA NA NA NA NA NA NA
7 3 5 5 4 2 4 NA NA NA NA NA NA NA
8 4 5 5 4 4 3 NA NA NA NA NA NA NA
9 1 5 5 5 3 5 NA NA NA NA NA NA NA
10 2 4 4 4 3 4 NA NA NA NA NA NA NA
When I use the "describe" function form hmisc as follows, I get a list of lists (as expected):
describe(questions)
Here I can see the data I want to extract and plot is in "frequency" under "values" of this list of lists.
How would I create a tidy data.frame which, for every column has the frequencies of 1's, 2's, 3's etc which is in the list output form the "describe" function above?:
summary[["barcost"]][["values"]]
$value
[1] 1 2 3 4 5
$frequency
[1] 348 806 410 360 79
So a data.frame that has the column headers as variables (under a column names "questions" for example) and then (using the example of the "barcost" questions above) 348 1's, 806 2's etc all for the "barcost" question variable.
I am aware that I may be trying to do something very complex when there is a simpler way of achieving the same goal, so open to suggestions.
Upvotes: 1
Views: 1355
Reputation: 93871
You can get frequencies by column more directly. gather
will convert the data to "long" format, which facilitates tabulation by group.
library(tidyverse)
freq = gather(questions) %>% group_by(key, value) %>% tally
Then you can graph the results, for example, like this:
ggplot(freq, aes(value, n)) +
geom_col() +
facet_wrap(~ key)
If we start with the output of describe
, you could do this:
freq = map_df(describe(questions), ~.x$values, .id="Column")
However, describe
doesn't return frequencies for columns with less than three unique values, so this approach would exclude any such columns from the resulting freq
data frame.
UPDATE: If I understand your comment, here's a way to color based on proportions of values:
# Fake data
set.seed(2)
dat = replicate(10, sample(1:5, 50, replace=TRUE))
# Get frequencies and proportions
freq = dat %>% as.data.frame %>%
gather() %>%
group_by(key, value) %>%
tally %>%
mutate(pct=n/sum(n))
ggplot(freq, aes(value, n, fill=pct)) +
geom_col() +
facet_wrap(~ key, ncol=5) +
scale_fill_gradient(low="red", high="blue")
Upvotes: 2