Reputation: 323
I have a dataset in which four variables have the same values. Now I want to calculate the percentage shares for the values in each variable, so I can plot them in a stacked bar chart.
This is an example dataset:
climate_change air_quality water_polution trash
<chr> <chr> <chr> <chr>
1 Not a very serious probl~ A somewhat serious prob~ A somewhat serious problem A very serious problem
2 Not a very serious probl~ Not a very serious prob~ Not a very serious problem Not a very serious pr~
3 NA NA NA NA
4 NA NA NA NA
5 A very serious problem A very serious problem A very serious problem A very serious problem
6 A somewhat serious probl~ A very serious problem Not at all a serious prob~ A somewhat serious pr~
I know how to calculate the percentage shares for each variable, e.g.:
lebanon %>%
filter(!is.na(climate_change)) %>%
count(climate_change) %>%
mutate(prop = n / sum(n))
Getting:
climate_change n prop
<chr> <int> <dbl>
1 A somewhat serious problem 348 0.286
2 A very serious problem 620 0.510
3 Not a very serious problem 202 0.166
4 Not at all a serious problem 45 0.0370
What I want now is to find a solution for keeping the values listed as rows while having the variables listed as columns with n and/or prop values. What would be the most efficient way to get that?
I want to have something like that:
climate_change air_quality ..... .....
<chr> <dbl>
1 A somewhat serious problem 0.286 .....
2 A very serious problem 0.510 .....
3 Not a very serious problem 0.166
4 Not at all a serious problem 0.0370
I had a hard time desciribing this problem and to find a similiar question on this site. I hope I have described it well and in the case you know a similiar question, just link it here. :)
Greetings
Upvotes: 3
Views: 45
Reputation: 3876
Like this?
library(tidyverse)
df %>%
pivot_longer(1:4) %>%
filter(!is.na(value)) %>%
count(name, value) %>%
group_by(name) %>%
mutate(prop = n / sum(n)) %>%
select(-n) %>%
pivot_wider(names_from = name, values_from = prop)
# A tibble: 4 x 5
value air_quality climate_change trash water_polution
<chr> <dbl> <dbl> <dbl> <dbl>
1 A somewhat serious probl 0.25 0.25 0.25 0.25
2 A very serious problem 0.5 0.25 0.5 0.25
3 Not a very serious probl 0.25 0.5 0.25 0.25
4 Not at all a serious prob NA NA NA 0.25
Data
df <- tibble::tribble(
~climate_change, ~air_quality, ~water_polution, ~trash,
"Not a very serious probl", "A somewhat serious probl", "A somewhat serious probl", "A very serious problem",
"Not a very serious probl", "Not a very serious probl", "Not a very serious probl", "Not a very serious probl",
NA, NA, NA, NA,
NA, NA, NA, NA,
"A very serious problem", "A very serious problem", "A very serious problem", "A very serious problem",
"A somewhat serious probl", "A very serious problem", "Not at all a serious prob", "A somewhat serious probl"
)
Upvotes: 0
Reputation: 1363
You can use the pivot functions from {tidyr} to apply your solution to a long form version of the data frame, and then pivot it back to the original shape.
data <- tribble(~Q1, ~Q2, ~Q3,
'ans1', 'ans1', 'ans1',
'ans1', 'ans2', 'ans2',
'ans2', 'ans2', 'ans2',
'ans1', 'ans3', 'ans2',
'ans3', 'ans1', NA,
'ans3', 'ans3', 'ans1',
NA , 'ans2', NA,)
data %>%
pivot_longer(everything()) %>%
group_by(name) %>%
count(value) %>%
drop_na() %>% # If you omit this line, NA values will be
# counted as a separate answer.
mutate(prop = n / sum(n)) %>%
select(-n) %>%
pivot_wider(values_from = prop, values_fill = list(prop = 0))
# If there is no proportion for a given Q/A combination,
# it is because the answer has not been given to this question.
# A tibble: 3 x 4
value Q1 Q2 Q3
<chr> <dbl> <dbl> <dbl>
1 ans1 0.5 0.286 0.4
2 ans2 0.167 0.429 0.6
3 ans3 0.333 0.286 0
Upvotes: 2