Ophelie
Ophelie

Reputation: 713

R visualize the behaviour of customers

I have data about which store has been visited by thousands of customers and by year :

data <- "Customer_ID Year Store_Visited 
1          1        2010         A         
2          1        2011         A_B         
3          1        2012         A_B         
4          2        2010         A        
5          2        2012         B  
6          3        2010         A
7          3        2011         A
8          3        2011         A
 "
data <- read.table(text=data, header = TRUE)

What I'd like to visualize is the impact of the opening of the store B, on the frequentation of the store A.

Here is a proposal :

  1. 3 customers went to the store A in 2010 : there is a line with a thickness=3 in 2010

  2. In 2011 the store B opened. 1 customer came only in the store A, 1 came in both stores, and the third one didn't came in our store : a line with a thickness=1 goes from A to A_and_B, and a line with a thickness=1 goes from A to A.

  3. In 2012, 1 customer who went in stores A_and_B in the previous year still went to stores A_and_B : line with a thickness=1 goes from A_and_B to A_and_B. 1 customer who went in store A in the previous year still went to store A : line with a thickness=1 goes from A to A, 1 customer went to store B and didn't visit our stores in the previous year : line with a thickness=1 from B to B.

http://imgur.com/U9UG4Hp

What do you think of this way to visualize the behaviour of these customers ?

How can I do it in R (with ggplot for instance) ?

For information I have thousands of customers, so I will maybe have to aggregate user who did the same visits in two consecutive years!

Upvotes: 2

Views: 125

Answers (1)

ThatGuy
ThatGuy

Reputation: 1233

This is an interesting problem. Firstly, I think your code needs a minor change on line 8 - Year should be 2012 I think. Thus:

data <- "Customer_ID Year Store_Visited 
1          1        2010         A         
2          1        2011         A_B         
3          1        2012         A_B         
4          2        2010         A        
5          2        2012         B  
6          3        2010         A
7          3        2011         A
8          3        2012         A
 "
data <- read.table(text=data, header = TRUE)

It's relatively easy to plot the diagram you're after from this using "lines":

Step 1: Arrange the data for plotting

This involves working out the number of people going from one place to another each year A) The possible moves are:

Stores = unique(data$Store_Visited)
#Get the range of possible moves
possibleMoves = character()
for(i in seq(1,length(Stores))){
    for(j in seq(1, length(Stores))){
        possibleMoves = c(possibleMoves, paste(Stores[i], Stores[j], sep="-"))
    }
}

B) The moves for the first year are:

Years = unique(data$Year)
MoveMat = data.frame(Years)
for(i in seq(1,length(possibleMoves))){MoveMat[possibleMoves[i]]=0}
FirstYear = table(data$Year, data$Store)[1,]
MoveMat[1,which(names(MoveMat) %in% paste(names(FirstYear), names(FirstYear), sep="-"))] = FirstYear

C) The moves for every other year are:

for(i in seq(2, length(Years))){
    temp = merge(subset(data, Year==Years[i-1]), subset(data, Year==Years[i]), by="Customer_ID", all.y=T)
    temp_new = which(is.na(temp[,3])); temp[temp_new,3] = temp[temp_new,5]  
    temp = table(paste(temp[,3], temp[,5],sep="-"))
    MoveMat[i,which(names(MoveMat) %in% names(temp))] = temp    
}

Now we have the moves by year in "MoveMat"

Step 2: Plot the moves

Given the new format, it's relatively simple to plot this with "lines": A) Set up the plotting window

plot(c(1,length(Years)+1), c(1,length(Stores)), col=0)

B) And cycle through plotting each year:

for(y in seq(1,length(Years))){
    count = 0
    for(i in seq(1,length(Stores))){
        for(j in seq(1,length(Stores))){
            count = count+1
            if(MoveMat[y,count+1]>0){
                lines(c(y,y+1), c(i,j), lwd=MoveMat[y,count+1])
            }
        }
    }   
}

That produces a figure similar to the one you have sketched. Is that what you're after? enter image description here

The whole code is below to make it easy for you to copy and paste:

data <- "Customer_ID Year Store_Visited 
1          1        2010         A         
2          1        2011         A_B         
3          1        2012         A_B         
4          2        2010         A        
5          2        2012         B  
6          3        2010         A
7          3        2011         A
8          3        2012         A
 "
data <- read.table(text=data, header = TRUE)

#Arrange data to give width of steps each year
Stores = unique(data$Store_Visited)
#Get the range of possible moves
possibleMoves = character()
for(i in seq(1,length(Stores))){
    for(j in seq(1, length(Stores))){
        possibleMoves = c(possibleMoves, paste(Stores[i], Stores[j], sep="-"))
    }
}
#First year is just the first column of the year-store table
Years = unique(data$Year)
MoveMat = data.frame(Years)
for(i in seq(1,length(possibleMoves))){MoveMat[possibleMoves[i]]=0}
FirstYear = table(data$Year, data$Store)[1,]
MoveMat[1,which(names(MoveMat) %in% paste(names(FirstYear), names(FirstYear), sep="-"))] = FirstYear
#Now the other years
for(i in seq(2, length(Years))){
    temp = merge(subset(data, Year==Years[i-1]), subset(data, Year==Years[i]), by="Customer_ID", all.y=T)
    temp_new = which(is.na(temp[,3])); temp[temp_new,3] = temp[temp_new,5]  
    temp = table(paste(temp[,3], temp[,5],sep="-"))
    MoveMat[i,which(names(MoveMat) %in% names(temp))] = temp    
}

#Now plot the results
#Set up plot
plot(c(1,length(Years)+1), c(1,length(Stores)), col=0)
#Plot each year
for(y in seq(1,length(Years))){
    count = 0
    for(i in seq(1,length(Stores))){
        for(j in seq(1,length(Stores))){
            count = count+1
            if(MoveMat[y,count+1]>0){
                lines(c(y,y+1), c(i,j), lwd=MoveMat[y,count+1]*3)
            }
        }
    }   
}

Upvotes: 1

Related Questions