Reputation: 4797
I am trying to plot Date
on the X axis and Revenue
on the Y axis. I have data for about 16000 customers, revenue aggregated on a weekly basis. The data set looks like the example dataset below (except that I have data for about 100 weeks and 16000 customers):
CustNum Date Revenue
1 2013-01-07 35
1 2013-01-14 23
1 2013-01-21 42
1 2013-01-28 65
2 2013-01-07 78
2 2013-01-14 48
2 2013-01-21 85
2 2013-01-28 34
I would like to plot this data on a single plot with one line on the plot representing one customer. In other words, the plot will have more than 16000 lines on it showing Revenue
for each customer, each week.
Now, I understand that this plot will be really messy with 16000 lines on it, I would like to have suggestions on what could be a better way to plot this data so it is not as cluttered.
I tried the following code which didn't give me the desired result:
p <- ggplot() + geom_line(data=res,aes(x=Date,y=Revenue,color=custnum))
This didn't give me multiple lines for multiple customers.
So I basically have two questions:
What could be a better way to represent this data?
How can we improve my code to show 16000 lines on a single plot? (I don't care about this question a lot if I can get another way to represent this data)
Any help with this will be much appreciated.
Upvotes: 0
Views: 1541
Reputation: 1717
Here is a base R outline of the approach in my comment above. I use a large matrix to hold all the data. The first column is whether the customer got the treatment. The subsequent columns are the weekly revenue for 100 weeks.
First, I will simulate some data, this has a lot of temporal noise.
#First records are a stable pattern
notreat<- matrix(c(rep(0,8000), 100+rnorm(8000*100,0,5)),nrow=8000)
#second set of records get no treatment for 50 weeks
treat<- matrix(c(rep(1,8000), 100+rnorm(8000*50,0,5)),nrow=8000)
#then get the treatment for 50 weeks
treat<-cbind(treat,
matrix(rnorm(50*8000,100+0.75*(0:50)),nrow=8000,ncol=50,byrow=TRUE))
m <- rbind(notreat, treat)
#use a color palette with transparency to be able to discern the overall pattern.
palette ( c(rgb(.4,0,0,0.01),rgb(0,0,0.4,0.01)))
#This will take several seconds to render 16000 lines
matplot(t(m[,2:101]),col=1+m[,1],type="l")
You can get your data frame into the type of matrix I build here using something like unstack()
or the reshape
package.
Upvotes: 1
Reputation: 3044
May be you are looking at something like this:
CustNum = c("1","1","1","1",
"2","2","2","2")
Date = c("2013-01-07","2013-01-14","2013-01-21","2013-01-28",
"2013-01-07","2013-01-14","2013-01-21","2013-01-28")
Revenue = c("35","23","42","65","78","48","85","34")
df = as.data.frame(cbind(CustNum,Date,Revenue))
df$CustNum = as.factor(df$CustNum)
df$Revenue = as.numeric(as.character(df$Revenue))
## create the factor variable
df$Treatment = ifelse(df$CustNum == '1','campaign','no campaign')
ggplot(df) + geom_point(aes(x=Date, y=Revenue, color=Treatment), size=5) + facet_wrap(~Treatment)
Results:
Now you can imagine doing the same with switching geom_point
with geom_boxplot
or geom_errorbar
across all your data points. You could alternatively opt for not faceting and just plot in one graph, but you'd have to specify within the geom call the option 'dodge' to avoid having you boxplot stacked over one another.
Results2:
Upvotes: 0