Reputation: 181
This question is related to my previous one, Subsetting a dataframe for a specified month and year
I use the command
sales <- read.csv("mysales.csv", colClasses="character")
to obtain a dataframe which looks like this:
row date pieces income
1 21/11/2011 49 220.5
2 22/11/2011 58 261
3 23/11/2011 23 103.5
4 24/11/2011 57 256.5
I want to create a subset for November 2011 using the code provided in my previous question, but various attempts have failed. So for a check I wrote in the console:
format.Date(sales[1,1], "%Y")=="2011"
and the answer was:
[1] FALSE
Moreover:
format(as.Date(sales[1,1]), "%d/%m/%Y")
[1] "20/11/21"
How can I, at least, know what is happening with date format?
What should I do to subset the dataframe using code like:
subset(sales, format.Date(date, "%m")=="11" & format.Date(date, "%Y")=="2011")
Sorry if my question is not clear, but the problem I am facing is not clear to me either.
(Edit to correct formatting)
Upvotes: 0
Views: 620
Reputation: 89057
Currently, what you think are dates really are just strings of characters. You need to turn them into Date
objects using as.Date
, and for that specify the format they are in (%d/%m/%Y
) or R won't guess it for you.
sales <- data.frame(date = c("21/11/2011", "21/11/2011", "23/11/2012", "24/11/2012"),
pieces = c(49,58,23,57,34),
income = c(220.5, 261, 103.5, 256.5, 112))
class(sales$date)
# [1] "factor"
sales$date <- as.Date(sales$date, "%d/%m/%Y")
class(sales$date)
# [1] "Date"
subset(sales, format.Date(date, "%m")=="11" & format.Date(date, "%Y")=="2011")
# date pieces income
# 1 2011-11-21 49 220.5
# 2 2011-11-21 58 261.0
Upvotes: 1
Reputation: 319
Just to make the answer more general, I have added another month too.
The working data looks like:
date pieces income
1 21/11/2011 49 220.5
2 22/11/2011 58 261.0
3 23/11/2011 23 103.5
4 24/11/2011 57 256.5
5 23/12/2011 50 240.0
There are many ways to do this. One I use regularly is strsplit and lapply.
sale$date1<-as.Date(sale$date, "%d/%m/%Y") # let R know the date format
# Create a column of months by splitting the dates into 3 parts and grabbing the middle
# part which is months
sale$months<-lapply(strsplit(as.character(sale$date1), "-"), function(x){x[2]})
# finally keep only the data for the month of November
required<-subset(sale[which(sale$months==11),], select=-c(months,date1))
date pieces income
1 21/11/2011 49 220.5
2 22/11/2011 58 261.0
3 23/11/2011 23 103.5
4 24/11/2011 57 256.5
Upvotes: 0