Reputation: 197
I am having trouble in creating a grouped summary statistics.
Below is the code that I'm using to create this summary dataset
library(dplyr)
#sample dataset
D A B C VAL PD
Agriculture Services Bought with Cash 01OCT2014 10 0.4435714
Agriculture Grain Bought with Cash 01OCT2014 10 0.7266667
Agriculture Livestock Bought with Cash 01OCT2014 10 1.1372414
Agriculture Fr, ve Bought with Cash 01OCT2014 10 1.5170370
Agriculture Livestock Financed 01OCT2014 76 1.1372414
Agriculture Fr, ve Financed 01OCT2014 76 1.5170370
Agriculture Grain Financed 01OCT2014 76 0.7266667
Agriculture Services Financed 01OCT2014 76 0.4435714
Agriculture Services Insurance 01OCT2014 10 0.4435714
Agriculture Livestock Insurance 01OCT2014 10 1.1372414
groupDF<-select.other %>%
group_by(.dots=c("A","B","C")) %>%
summarize(PD=mean(PD),VAL=mean(VAL))
I'm expecting the dataset to have the mean PD and mean VAL grouped by A, B, and C
A B C PD VAL
Services Bought with Cash 01OCT2017 1 10
Instead I am getting
PD VAL
0.8574816 6059877
Any help or guidance will be appreciated.
Upvotes: 1
Views: 58
Reputation: 887193
We can use group_by_at
if it is a string
library(dplyr)
select.other %>%
group_by_at(vars(c("A","B","C"))) %>%
summarize(PD=mean(PD),VAL=mean(VAL))
# A tibble: 10 x 5
# Groups: A, B [10]
# A B C PD VAL
# <chr> <chr> <chr> <dbl> <dbl>
# 1 Fr, ve Bought with Cash 01OCT2014 1.52 10
# 2 Fr, ve Financed 01OCT2014 1.52 76
# 3 Grain Bought with Cash 01OCT2014 0.727 10
# 4 Grain Financed 01OCT2014 0.727 76
# 5 Livestock Bought with Cash 01OCT2014 1.14 10
# 6 Livestock Financed 01OCT2014 1.14 76
# 7 Livestock Insurance 01OCT2014 1.14 10
# 8 Services Bought with Cash 01OCT2014 0.444 10
# 9 Services Financed 01OCT2014 0.444 76
#10 Services Insurance 01OCT2014 0.444 10
or another option is to convert to sym
bols and then do the evaluation (!!!
)
select.other %>%
group_by(!!! rlang::syms(c("A","B","C"))) %>%
summarize(PD=mean(PD),VAL=mean(VAL))
select.other <- structure(list(D = c("Agriculture", "Agriculture", "Agriculture",
"Agriculture", "Agriculture", "Agriculture", "Agriculture", "Agriculture",
"Agriculture", "Agriculture"), A = c("Services", "Grain", "Livestock",
"Fr, ve", "Livestock", "Fr, ve", "Grain", "Services", "Services",
"Livestock"), B = c("Bought with Cash", "Bought with Cash", "Bought with Cash",
"Bought with Cash", "Financed", "Financed", "Financed", "Financed",
"Insurance", "Insurance"), C = c("01OCT2014", "01OCT2014", "01OCT2014",
"01OCT2014", "01OCT2014", "01OCT2014", "01OCT2014", "01OCT2014",
"01OCT2014", "01OCT2014"), VAL = c(10L, 10L, 10L, 10L, 76L, 76L,
76L, 76L, 10L, 10L), PD = c(0.4435714, 0.7266667, 1.1372414,
1.517037, 1.1372414, 1.517037, 0.7266667, 0.4435714, 0.4435714,
1.1372414)), class = "data.frame", row.names = c(NA, -10L))
Upvotes: 4