datazang
datazang

Reputation: 1179

How to pivot data in R?

I have a data set as I've shown below:

df <- tribble(
  ~id,  ~clicks,       
  "1",    10,    
  "2",     5,    
  "3",     7,    
  "4",     6,    
  "5",     3,    
  "6",     4,    
  "7",     5,    
  "8",     6,    
  "9",     1,    
  "10",    9,   
)

Now, I want to set an interval from id column for the columns and then sum the rows for each interval. Simply, here is my desired data:

desired_df <- tribble(
  ~"1-2",   ~"2-5",    ~"5-10"      
  "26.7%",  "28.5%",   "44.6%"
)

How can I get the desired data?

Upvotes: 0

Views: 323

Answers (3)

Emil
Emil

Reputation: 21

You can calculate the sum of clicks first, then make subgroups based on intervals, where you count the number of clicks:

library(dplyr)
library(tidyr)

total_clicks <- df %>% pull(clicks) %>% sum()

df %>% 
  group_by(subgroup = cut(as.numeric(id), breaks = c(0, 2, 5, 10))) %>% 
  summarise(percent = sum(clicks) / total_clicks) %>% 
  ungroup() %>% 
  spread(subgroup, percent)

Upvotes: 2

Dan
Dan

Reputation: 1778

A more tidyverse-like way of doing this would be:

df %>% 
  mutate(Group = ifelse(id %in% c("1", "2"), "Group1",
                        ifelse(id %in% c("3", "4", "5"),"Group2",
                               ifelse(id %in% c("6", "7", "8", "9", "10"),"Group3", "Others")))) %>% 
  group_by(Group) %>% 
  summarise(sumClicks = sum(clicks)) %>% 
  mutate(Proportion_of_Clicks = sumClicks / colSums(select(., sumClicks)))

The nested if-else is verbose, but simple for a beginner to understand and edit.

Upvotes: 1

IceCreamToucan
IceCreamToucan

Reputation: 28705

library(dplyr)

df %>% 
  # create a segmented id variable with cut, and group by it
  group_by(idseg = cut(as.numeric(id), c(0, 2, 5, 10))) %>% 
  # sum the clicks in each segment
  summarise(clicks = sum(clicks)) %>% 
  # divide these by the overall sum of clicks from df, and format as percent
  mutate(clicks = scales::percent(clicks/sum(df$clicks))) %>% 
  # pivot wider with idseg as the colnames and clicks as the value
  pivot_wider(names_from = idseg, values_from = clicks)

# # A tibble: 1 x 3
#   `(0,2]` `(2,5]` `(5,10]`
#   <chr>   <chr>   <chr>   
# 1 26.8%   28.6%   44.6%   

Upvotes: 3

Related Questions