Reputation: 87
I am trying to develop a shiny dashboard app that is able to produce a bar graph for different outcome variables that can be selected by the user. To do so, I need to subset my data reactively to generate aggregate data frames. I am able to have the code below successfully filter my data reactively, but I am running into trouble when I try to use dplyr::summarise()
reactively.
Here is my data
dput(head(df))
structure(
list(
geoid = c(
"01001020200",
"01001020300",
"01001020700",
"01001020802",
"01001021000",
"01001021100"
),
state = c(
"Alabama",
"Alabama",
"Alabama",
"Alabama",
"Alabama",
"Alabama"
),
county = c(
"Autauga County",
"Autauga County",
"Autauga County",
"Autauga County",
"Autauga County",
"Autauga County"
),
ozzone = structure(
c(1L, 1L, 2L, 1L, 1L, 1L),
.Label = c("non.oz", "oz"),
class = "factor"
),
tract_type = c(
"LICs",
"Contiguous",
"LICs",
"Contiguous",
"Contiguous",
"LICs"
),
investment_score_1_low_10_high = c(4,
6, 9, 10, 5, 6),
socioeconomic_change_flag_1_yes_blank_no = c(0,
0, 0, 0, 0, 0),
fips_county = c("01001", "01001", "01001", "01001",
"01001", "01001"),
total_empl = c(51809L, 51809L, 51809L, 51809L,
51809L, 51809L),
total_payroll = c(338395L, 338395L, 338395L,
338395L, 338395L, 338395L),
total_establishments = c(5090L, 5090L,
5090L, 5090L, 5090L, 5090L),
largest_employer = c(72L, 72L, 72L,
72L, 72L, 72L),
largest_employer_bypayroll = c(44L, 44L, 44L,
44L, 44L, 44L),
trend_employee_change = c(
2735.60000000046,
2735.60000000046,
2735.60000000046,
2735.60000000046,
2735.60000000046,
2735.60000000046
),
trend_payroll_change = c(
23074.8000000037,
23074.8000000037,
23074.8000000037,
23074.8000000037,
23074.8000000037,
23074.8000000037
),
trend_establishment_change = c(
53.4000000000084,
53.4000000000084,
53.4000000000084,
53.4000000000084,
53.4000000000084,
53.4000000000084
),
damage_cost_weather_total = c(20000, 20000, 20000, 20000,
20000, 20000),
deaths_weather_total = c(0L, 0L, 0L, 0L, 0L, 0L),
medianrent = c(537, 633, 525, 680, 409, 303),
vacancyrate = c(
0.108200455580866,
0.113652113652114,
0.0436681222707424,
0.0512166859791425,
0.229962546816479,
0.21030303030303
),
total_pop = c(503, 827, 900, 2989, 740, 813),
undertwo_percent = c(
0.391650099403579,
0.351874244256348,
0.397777777777778,
0.17096018735363,
0.301351351351351,
0.263222632226322
),
mobility_rate = c(
0.133702166897188,
0.0737753882915173,
0.196514423076923,
0.172716680111141,
0.0641304347826087,
0.0681084570690769
),
unemploy_rate = c(
0.0176991150442478,
0.0273203592814371,
0.109881724532621,
0.0127906976744186,
0.0344982078853047,
0.0281910728269381
),
median_income = c(41287, 46806, 41250, 64439,
46607, 36450),
renter_percent = c(
0.337653478854025,
0.310596310596311,
0.331877729257642,
0.268110942458949,
0.328686327077748,
0.365986394557823
),
blackaa_percent = c(
0.5451197053407,
0.264697193500739,
0.145906432748538,
0.152916262243007,
0.258583690987124,
0.530922930542341
),
hispanic_percent = c(
0.0105893186003683,
0.0803545051698671,
0.0400584795321637,
0.0137651107385511,
0.00822603719599428,
0.00666032350142721
),
transit_score_mean = c(0, 0, 0, 0, 0, 0),
life_expectancy = c(75.67, 75.67, 75.67, 75.67, 75.67, 75.67),
trend_life_expectancy = c(5.1, 5.1, 5.1, 5.1, 5.1, 5.1),
median_monthly_housing_costs = c(885,
885, 885, 885, 885, 885),
pestilence_2018 = c(2, 2, 2, 2, 2,
2),
total_pop_county = c(6772, 6772, 6772, 6772, 6772, 6772),
deaths_weather_pop = c(0, 0, 0, 0, 0, 0),
cost_weather_pop = c(
2.95333727111636,
2.95333727111636,
2.95333727111636,
2.95333727111636,
2.95333727111636,
2.95333727111636
),
Male_HSgrad = c(75, 68, 211, 189, 97,
42),
Male_SomeCollege = c(28, 18, 51, 111, 74, 38),
Male_AssocDeg = c(4,
6, 0, 63, 0, 21),
Male_BachDeg = c(7, 9, 0, 11, 0, 9),
Male_GradDeg = c(0,
0, 0, 29, 6, 0),
MaleEduAboveHS = c(114, 101, 262, 403, 177,
110),
Total_Male18.24 = c(145, 123, 285, 455, 202, 110),
MaleEduHSAbove_pop = c(
0.786206896551724,
0.821138211382114,
0.919298245614035,
0.885714285714286,
0.876237623762376,
1
),
Female_HSgrad = c(11, 60, 87, 156, 23, 83),
Female_SomeCollege = c(22,
25, 13, 47, 54, 65),
Female_AssocDeg = c(0, 0, 20, 82, 0,
0),
Female_BachDeg = c(5, 26, 0, 19, 0, 11),
Female_GradDeg = c(5,
16, 0, 0, 0, 0),
FemaleEduAboveHS = c(43, 127, 120, 304,
77, 159),
Total_Female18.24 = c(53, 127, 192, 581, 92, 198),
FemaleEduHSAbove_pop = c(
0.811320754716981,
1,
0.625,
0.523235800344234,
0.83695652173913,
0.803030303030303
)
),
row.names = c(NA,
6L),
class = "data.frame"
)
Here is my code
#List of potential outcome variables to be plotted
variables <- c("total_empl", "total_payroll", "total_establishments", "largest_employer", "largest_employer_bypayroll", "trend_employee_change", "trend_payroll_change", "trend_establishment_change", "damage_cost_weather_total", "deaths_weather_total", "medianrent", "vacancyrate", "total_pop", "undertwo_percent", "mobility_rate", "unemploy_rate", "median_income", "renter_percent", "blackaa_percent", "hispanic_percent", "median_monthly_housing_costs", "MaleEduAboveHS_pop", "FemaleEduHSAbove_pop")
# Define inputs
selectInput('state_name', label = 'Select a state', choices = lookup)
selectInput('DV', label = 'Outcome Measure', choices = variables)
#Filter data based on the State and outcome measure the user would like to investigate.
bar <- reactive({
st <- df %>%
filter(state == input$state_name)
bp <- st %>%
group_by(tract_type) %>%
summarise(Outcome = mean(st[,input$DV]))
return(bp)
})
bar
UPDATE
Right now, this code successfully filters the data by the input$state_name
, but there is an issue with the calculation of means. The result is this:
# A tibble: 2 x 2
tract_type Outcome
<chr> <dbl>
1 Contiguous 468296.
2 LICs 468296.
As you can see, the means that are calculated are identical. In fact, these values correspond to the grand average mean for whichever variable is chosen for input$DV
. Therefore, the filtered st
data is not being successfully grouped into the two levels of tract_type
.
Upvotes: 0
Views: 190
Reputation: 3341
I see what you are trying to do. The difference is that in your reactive part you try to calculate the mean of a string, which won't work. What you want to do is summarise one of the columns in df by providing the name
In the following example, I specify the summarising variable manually. Note that investment_score_1_low_10_high does not have quotes. investment_score_1_low_10_high is what is called a symbol in R.
st <- df %>%
filter(state == "Alabama") %>%
group_by(tract_type) %>%
summarise(Outcome = mean(investment_score_1_low_10_high))
But I think this should work:
bar <- reactive({
# Create a symbol from string.
mean_variable <- sym(input$DV)
bp <- df %>%
filter(state == input$state_name) %>%
group_by(tract_type) %>%
summarise(Outcome = mean(!! mean_variable, na.rm = TRUE))
return(bp)
})
Extra information about the use of !!
and what it does can be found here: Here
And even better with examples Here
Upvotes: 1
Reputation: 87
bar <- reactive({
# Create a symbol from string.
mean_variable <- sym(input$DV)
bp <- df %>%
filter(state == input$state_name) %>%
group_by(tract_type) %>%
summarise(Outcome = mean(!! mean_variable, na.rm = TRUE))
return(bp)
})
Upvotes: 0