Reputation: 1399
I'm hoping someone out there has a solution for using some form of expand.grid
in piping using dplyr. I am doing some modeling where I have a few different groups (or Types below) and the groups have different ranges for x & y data. Once I run a gam on the data I am interested in creating a plot for the predictions, but I only want to predict values over the range that each value occupies, not the whole range of the data set.
I already have a working example posted below, but I'm wondering if there is a way to get around using a loop and complete my task.
Cheers
require(ggplot2)
require(dplyr)
# Create some data
df = data.frame(Type = rep(c("A","B"), each = 100),
x = c(rnorm(100, 0, 1), rnorm(100, 2, 1)),
y = c(rnorm(100, 0, 1), rnorm(100, 2, 1)))
# and if you want to check out the data
ggplot(df,aes(x,y,col=Type)) + geom_point() + stat_ellipse()
# OK so I have no issue extracting the minimum and maximum values
# for each type
df_summ = df %>%
group_by(Type) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y))
df_summ
# and I can create a loop and use the expand.grid function to get my
# desired output
test = NULL
for(ii in c("A","B")){
df1 = df_summ[df_summ$Type == ii,]
x = seq(df1$xmin, df1$xmax, length.out = 10)
y = seq(df1$ymin, df1$ymax, length.out = 10)
coords = expand.grid(x = x, y = y)
coords$Type = ii
test = rbind(test, coords)
}
ggplot(test, aes(x,y,col = Type)) + geom_point()
But what I would really like to do is find a way to bypass the loop and try and get the same output straight from my piping operator. I've tried a few combinations using the do() function but to no effect, and the one posted below is just one of many, many failed attempts
df %>%
group_by(Type) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y)) %>%
do(data.frame(x = seq(xmin, xmax, length.out = 10),
y = seq(ymin, ymax, length.out = 10)))
# this last line returns an error
# Error in is.finite(from) :
# default method not implemented for type 'closure'
Upvotes: 3
Views: 827
Reputation: 6277
Using the data_grid
function from the modelr
package, here's one way to do it:
library(dplyr)
library(modelr)
df %>%
group_by(Type) %>%
data_grid(x, y) %>%
ggplot(aes(x,y, color = Type)) + geom_point()
This approach generates for each value of x
and each value of y
in each group a row containing the pair x
and y
. So each x
-y
pair in the resulting dataframe is based only on values of x
and y
that actually appear in the data.
Upvotes: 1
Reputation: 206546
Your do()
attempt was almost right. The trick is just to re-group after the summarize (which seems to drop the grouping). Also you need to make sure to grab the values from the data in the chain using .$
. Try this
test <- df %>%
group_by(Type) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y)) %>%
group_by(Type) %>%
do(expand.grid(x = seq(.$xmin, .$xmax, length.out = 10),
y = seq(.$ymin, .$ymax, length.out = 10)))
ggplot(test, aes(x,y,col = Type)) + geom_point()
Upvotes: 2