Reputation: 1179
I have a data set as I've shown below:
df <- tribble(
~id, ~clicks,
"1", 10,
"2", 5,
"3", 7,
"4", 6,
"5", 3,
"6", 4,
"7", 5,
"8", 6,
"9", 1,
"10", 9,
)
Now, I want to set an interval from id column for the columns and then sum the rows for each interval. Simply, here is my desired data:
desired_df <- tribble(
~"1-2", ~"2-5", ~"5-10"
"26.7%", "28.5%", "44.6%"
)
How can I get the desired data?
Upvotes: 0
Views: 323
Reputation: 21
You can calculate the sum of clicks first, then make subgroups based on intervals, where you count the number of clicks:
library(dplyr)
library(tidyr)
total_clicks <- df %>% pull(clicks) %>% sum()
df %>%
group_by(subgroup = cut(as.numeric(id), breaks = c(0, 2, 5, 10))) %>%
summarise(percent = sum(clicks) / total_clicks) %>%
ungroup() %>%
spread(subgroup, percent)
Upvotes: 2
Reputation: 1778
A more tidyverse-like way of doing this would be:
df %>%
mutate(Group = ifelse(id %in% c("1", "2"), "Group1",
ifelse(id %in% c("3", "4", "5"),"Group2",
ifelse(id %in% c("6", "7", "8", "9", "10"),"Group3", "Others")))) %>%
group_by(Group) %>%
summarise(sumClicks = sum(clicks)) %>%
mutate(Proportion_of_Clicks = sumClicks / colSums(select(., sumClicks)))
The nested if-else is verbose, but simple for a beginner to understand and edit.
Upvotes: 1
Reputation: 28705
library(dplyr)
df %>%
# create a segmented id variable with cut, and group by it
group_by(idseg = cut(as.numeric(id), c(0, 2, 5, 10))) %>%
# sum the clicks in each segment
summarise(clicks = sum(clicks)) %>%
# divide these by the overall sum of clicks from df, and format as percent
mutate(clicks = scales::percent(clicks/sum(df$clicks))) %>%
# pivot wider with idseg as the colnames and clicks as the value
pivot_wider(names_from = idseg, values_from = clicks)
# # A tibble: 1 x 3
# `(0,2]` `(2,5]` `(5,10]`
# <chr> <chr> <chr>
# 1 26.8% 28.6% 44.6%
Upvotes: 3