Reputation: 127
The dataset represents which client (Cstid = Customer id) has made a purchase on which day.
I am facing difficulties finding a solution to plot the number of purchase per day and month.
Please find below an example of the dataset, I have in total 7505 observations.
"Cstid" "Date"
1 4195 19/08/17
2 3937 16/08/17
3 2163 07/09/17
4 3407 08/10/16
5 4576 04/11/16
6 3164 16/12/16
7 3174 18/08/15
8 1670 18/08/15
9 1671 18/08/15
10 4199 19/07/14
11 4196 19/08/14
12 6725 14/09/14
13 3471 14/09/13
I have started by converting the Date column :
df$Date <- as.Date(df$Date, '%d/%m/%Y')
Then counted the number of observation per dates using :
library(data.table)
dt <- as.data.table(df)
dt[,days:=format(Date,"%d.%m.%Y")]
dt1 <- data.frame(dt[,.N,by=days])
And tried to plot with :
plot(dt1$days, dt1$N,type="l")
But i get the following error message :
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
Could someone please inform how I should proceed?
Upvotes: 4
Views: 9209
Reputation: 42592
You need to specifiy a 2 digit year using %y
(lower case) in order to convert the Date
column from character to class Date
.
If ggplot2
is used for plotting, it will also do the aggregation. geom_bar()
uses the count
statistics by default. This spares us to compute the aggregates (counts) beforehand.
For aggregation by month, I recommend to map all dates to the first day of each month, e.g., using lubridate::floor_date()
. This keeps a continuous scale on the x-axis.
So, the complete code would be:
# convert Date from character to class Date using a 2 digit year
df$Date <- as.Date(df$Date, '%d/%m/%y')
library(ggplot2)
# aggregate by day
ggplot(df) + aes(x = Date) +
geom_bar()
#aggregate by month
ggplot(df) + aes(x = lubridate::floor_date(Date, "month")) +
geom_bar()
Alternatively, the dates can be mapped to character month, e.g., "2015-08"
. But this will turn the x-axis into a discrete scale which no longer shows the elapsed time between purchases:
# aggregate by month using format() to create discrete scale
ggplot(df) + aes(x = format(Date, "%Y-%m")) +
geom_bar()
Upvotes: 4
Reputation: 1177
#reproduciable data:
df <- data.frame(Cstid=c(4195,3937,2163,3407,4576,3164,3174,1670,1671,4199,4196,6725,3471),
Date=c('19/08/17','16/08/17','07/09/17','08/10/16','04/11/16','16/12/16','18/08/15','18/08/15',
'18/08/15','19/07/14','19/08/14','14/09/14','14/09/13'))
#convert format:
df$Date <- as.character(df$Date)
Y <- paste('20',sapply(strsplit(df$Date,split = '/'),function(x){x[3]}),sep='')
M <- sapply(strsplit(df$Date,split = '/'),function(x){x[2]})
D <- sapply(strsplit(df$Date,split = '/'),function(x){x[1]})
df$Date <- as.POSIXct(paste(Y,M,D,sep='-'),format='%Y-%m-%d')
#count per day plot:
days <- unique(df$Date)
dcount <- vector()
for (i in 1:length(days)) {
dcount[i] <- nrow(df[df$Date==days[i],])
}
library(ggplot2)
ggplot(data=data.frame(days,dcount),aes(x=days,y=dcount))+geom_point()
#count per month plot:
df$month <- months(df$Date)
mon <- unique(df$month)
mcount <- vector()
for (i in 1:length(mon)) {
mcount[i] <- nrow(df[df$month==mon[i],])
}
ggplot(data.frame(mon,mcount),aes(x=mon,y=mcount))+geom_point()
Upvotes: 1