R Capturing regression slopes by group in a dataframe

Question

My dataframe consists of scores for different questions asked in a survey, over 3 fiscal years (FY13, FY14 & FY15). The results are presented by Region.

Here's what a sample of the actual dataframe looks like, where we have two questions per region, asked in different years.

testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"),
              Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)),
              QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)),
              Very.Satisfied=runif(16,min = 0, max=1),
              Total.Very.Satisfied=floor(runif(16,min=10,max=120)))

My Objective

For each region, my objective is to identify which question experienced the most significant upward evolution across this 3 year time frame. In order to measure significant upward movements, I have decided to use the slope of regression as a parameter.

The question with the most significant upward evolution within a region over the 3 years time frame will be the one with the steepest positive slope.

Using this logic, I have decided to do the following -

1) For each combination of Region and QST, I run the lm function.

2) I extract the slope for each combination, and store it as a separate variable. Then for each region I filter out the question with the maximum slope value.

My Attempt

Here is my attempt at solving this.

test_final=testdf %>%   
group_by(Region,QST) %>% 
map(~lm(FY ~ Very.Satisfied, data = .)) %>%
map_df(tidy) %>%
filter(term == 'circumference') %>%
select(estimate) %>% 
summarise(Value = max(estimate))

However when I run this I get an error message saying that object FY was not found.

Additional requirement

Also I'd like this to work only for questions that have at least 2 consecutive years of data for comparison. But I'm unable to figure out how to factor this condition into my code.

Any help with this would be greatly appreciated.

A. S. K. · Accepted Answer

This doesn't do the "at least two consecutive years" part, but it does the "get the question with the largest slope" part:

library(dplyr)
test_final = testdf %>%
  mutate(FY.num = as.numeric(gsub("FY", "", FY))) %>%
  group_by(Region, QST) %>%
  mutate(lm_slope = lm(Very.Satisfied ~ FY.num)$coefficients[["FY.num"]]) %>%
  ungroup() %>%
  group_by(Region) %>%
  filter(lm_slope == max(lm_slope))

R Capturing regression slopes by group in a dataframe

Answers (2)

Related Questions