Reputation: 131
I have a data frame that looks like this:
YEAR Region Illness_Code Illness_description COUNT
2014 A ABC test 222
2015 A ABC test 122
2016 A ABC test 111
2014 B XYZ testttt 333
2015 B XYZ testttt 3232
2016 B XYZ testttt 123
2014 C ABC test 333
2015 C ABC test 123
2016 C ABC test 123
.....
I can only get the coefficients of every distinct illnesses
but not per region
.
Below is the code used:
# Get only illnesses which occurs every year
df <- df %>%
group_by(Illness_Code) %>%
filter(n() == 3)
# To dataframe
df <- data.frame(df)
# Loop through the dataframe and apply model
out <- lapply(
unique(df$Illness_Code),
function(c){
sub_cases <- subset(df, Illness_Code == c)
m <- lm(formula = COUNT ~ YEAR, data = sub_cases)
coef(m)
})
# Format the data
out <- do.call(rbind, out)
# Make it a dataframe
out <- data.frame(out)
Results come out like this:
X.Intercept. YEAR
1 37254.05 -787.33
2 30745.21 3005.84
3 6992.99 2480.82
4 8391.65 3521.96
5 19298.03 -345.88
6 15163.82 -438.50
What I want is to get the coefficients
of every distinct illnesses
per region
.
How do I group it by distinct illnesses
and region
?
So as the result should be:
Region Illness_Code Illness_description Intercept Slope COUNT_2016
A ABC test 222.123 15 111
A XYZ testttt 122.222 121.1 222
B ABC test ... ... ...
B XYZ testttt
C ABC test
C XYZ testttt
.....
Upvotes: 0
Views: 490
Reputation: 13135
library(dplyr)
library(tidyr) #nest
library(broom) #tidy
library(purrr) #map
df %>% group_by(Region,Illness_Code) %>% nest() %>%
mutate(fit=map(data, ~lm(COUNT~YEAR, data = .)), results = map(fit, tidy)) %>%
unnest(results)
# A tibble: 6 x 7
Region Illness_Code term estimate std.error statistic p.value
<fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 A ABC (Intercept) 111984. 51770. 2.16 0.276
2 A ABC YEAR -55.5 25.7 -2.16 0.276
3 B XYZ (Intercept) 212804. 3494736. 0.0609 0.961
4 B XYZ YEAR -105. 1734. -0.0605 0.962
5 C ABC (Intercept) 211768. 122153. 1.73 0.333
6 C ABC YEAR -105. 60.6 -1.73 0.333
Using lapply
and split
#Identify list elements with nrow greater than one
Ind <- sapply(split(df1, list(df1$Region,df1$Illness_Code)), function(x)nrow(x)>1)
lapply(
#Loop only throught list elements wiht nrow>1
split(df, list(df$Region,df$Illness_Code))[Ind],
function(x){
#browser()
m <- lm(formula = COUNT ~ YEAR, data = x)
#coef(m)
as.data.frame(cbind(t(coef(m)), 'Year_2016'=x[x$YEAR==2016,'COUNT']))
})
By default split(df1, list(df1$Region,df1$Illness_Code))
will generate a list containing all interactions between levels of Region
and Illness_Code
, but some of these interactions with nrow=0 e.g $B.ABC
and $A.XYZ
which will cause problems later so we need to remove them using an indicator
Upvotes: 2