Reputation: 713
I have data about which store has been visited by thousands of customers and by year :
data <- "Customer_ID Year Store_Visited
1 1 2010 A
2 1 2011 A_B
3 1 2012 A_B
4 2 2010 A
5 2 2012 B
6 3 2010 A
7 3 2011 A
8 3 2011 A
"
data <- read.table(text=data, header = TRUE)
What I'd like to visualize is the impact of the opening of the store B, on the frequentation of the store A.
3 customers went to the store A in 2010 : there is a line with a thickness=3 in 2010
In 2011 the store B opened. 1 customer came only in the store A, 1 came in both stores, and the third one didn't came in our store : a line with a thickness=1 goes from A to A_and_B, and a line with a thickness=1 goes from A to A.
What do you think of this way to visualize the behaviour of these customers ?
How can I do it in R (with ggplot for instance) ?
For information I have thousands of customers, so I will maybe have to aggregate user who did the same visits in two consecutive years!
Upvotes: 2
Views: 125
Reputation: 1233
This is an interesting problem. Firstly, I think your code needs a minor change on line 8 - Year should be 2012 I think. Thus:
data <- "Customer_ID Year Store_Visited
1 1 2010 A
2 1 2011 A_B
3 1 2012 A_B
4 2 2010 A
5 2 2012 B
6 3 2010 A
7 3 2011 A
8 3 2012 A
"
data <- read.table(text=data, header = TRUE)
It's relatively easy to plot the diagram you're after from this using "lines":
This involves working out the number of people going from one place to another each year A) The possible moves are:
Stores = unique(data$Store_Visited)
#Get the range of possible moves
possibleMoves = character()
for(i in seq(1,length(Stores))){
for(j in seq(1, length(Stores))){
possibleMoves = c(possibleMoves, paste(Stores[i], Stores[j], sep="-"))
}
}
B) The moves for the first year are:
Years = unique(data$Year)
MoveMat = data.frame(Years)
for(i in seq(1,length(possibleMoves))){MoveMat[possibleMoves[i]]=0}
FirstYear = table(data$Year, data$Store)[1,]
MoveMat[1,which(names(MoveMat) %in% paste(names(FirstYear), names(FirstYear), sep="-"))] = FirstYear
C) The moves for every other year are:
for(i in seq(2, length(Years))){
temp = merge(subset(data, Year==Years[i-1]), subset(data, Year==Years[i]), by="Customer_ID", all.y=T)
temp_new = which(is.na(temp[,3])); temp[temp_new,3] = temp[temp_new,5]
temp = table(paste(temp[,3], temp[,5],sep="-"))
MoveMat[i,which(names(MoveMat) %in% names(temp))] = temp
}
Now we have the moves by year in "MoveMat"
Given the new format, it's relatively simple to plot this with "lines": A) Set up the plotting window
plot(c(1,length(Years)+1), c(1,length(Stores)), col=0)
B) And cycle through plotting each year:
for(y in seq(1,length(Years))){
count = 0
for(i in seq(1,length(Stores))){
for(j in seq(1,length(Stores))){
count = count+1
if(MoveMat[y,count+1]>0){
lines(c(y,y+1), c(i,j), lwd=MoveMat[y,count+1])
}
}
}
}
That produces a figure similar to the one you have sketched. Is that what you're after?
The whole code is below to make it easy for you to copy and paste:
data <- "Customer_ID Year Store_Visited
1 1 2010 A
2 1 2011 A_B
3 1 2012 A_B
4 2 2010 A
5 2 2012 B
6 3 2010 A
7 3 2011 A
8 3 2012 A
"
data <- read.table(text=data, header = TRUE)
#Arrange data to give width of steps each year
Stores = unique(data$Store_Visited)
#Get the range of possible moves
possibleMoves = character()
for(i in seq(1,length(Stores))){
for(j in seq(1, length(Stores))){
possibleMoves = c(possibleMoves, paste(Stores[i], Stores[j], sep="-"))
}
}
#First year is just the first column of the year-store table
Years = unique(data$Year)
MoveMat = data.frame(Years)
for(i in seq(1,length(possibleMoves))){MoveMat[possibleMoves[i]]=0}
FirstYear = table(data$Year, data$Store)[1,]
MoveMat[1,which(names(MoveMat) %in% paste(names(FirstYear), names(FirstYear), sep="-"))] = FirstYear
#Now the other years
for(i in seq(2, length(Years))){
temp = merge(subset(data, Year==Years[i-1]), subset(data, Year==Years[i]), by="Customer_ID", all.y=T)
temp_new = which(is.na(temp[,3])); temp[temp_new,3] = temp[temp_new,5]
temp = table(paste(temp[,3], temp[,5],sep="-"))
MoveMat[i,which(names(MoveMat) %in% names(temp))] = temp
}
#Now plot the results
#Set up plot
plot(c(1,length(Years)+1), c(1,length(Stores)), col=0)
#Plot each year
for(y in seq(1,length(Years))){
count = 0
for(i in seq(1,length(Stores))){
for(j in seq(1,length(Stores))){
count = count+1
if(MoveMat[y,count+1]>0){
lines(c(y,y+1), c(i,j), lwd=MoveMat[y,count+1]*3)
}
}
}
}
Upvotes: 1