Rob
Rob

Reputation: 1048

Plotting grouped averages in R

I am getting probabilities via linear regression y~x where x are floating point values across a fixed range, e.g. between 0 and 5, and the observed y's are all 0s or 1s. Note that x's can be duplicated, e.g. data is (0.1,0),(0.1,1),(0.1,0),(0.12,1) etc.

Doing the regression itself is fine and I can also plot the regression results, e.g. via the ggplot2 package

qplot(x,y,data=data,geom='smooth',method='lm')

Since scatter plots of the actual data would add lots of points at y=0 and y=1, I was hoping to get "grouped averages", e.g. the average y values for all x in [0,0.2) as one point, another one for [0.2,0.4) etc.

Ideally, that plot would also show sample sizes similar to how regression operates, e.g. if one grouped average has less underlying data than another, then it is shown in a smaller circle, like a bubble plot.

Upvotes: 1

Views: 67

Answers (1)

Rohit
Rohit

Reputation: 2017

Use cut to segregate the samples into intervals. You can use data.table to do some quick aggregation. Then it's a matter of adding a size component to your plot:

x<-rnorm(100)
y<-5*x+6+rnorm(100,sd=0.2)
DT<-data.table(x,y)
DT[,bin:=cut(x,seq(-3,3,0.2),right = F)]
#Aggregate table
DT1<-DT[,.(mx=mean(x),my=mean(y),.N),by=bin]
qplot(x,y,data=DT,geom='smooth',method='lm')+
    geom_point(data = DT1,aes(x= mx,y=my,size=N))

Sample output

Upvotes: 1

Related Questions