John James Pork
John James Pork

Reputation: 181

I am unable to specify the correct date format to subset a dataframe in R

This question is related to my previous one, Subsetting a dataframe for a specified month and year

I use the command

sales <- read.csv("mysales.csv", colClasses="character")

to obtain a dataframe which looks like this:

    row     date            pieces       income
    1       21/11/2011      49           220.5
    2       22/11/2011      58           261
    3       23/11/2011      23           103.5
    4       24/11/2011      57           256.5

I want to create a subset for November 2011 using the code provided in my previous question, but various attempts have failed. So for a check I wrote in the console:

format.Date(sales[1,1], "%Y")=="2011"

and the answer was:

[1] FALSE

Moreover:

format(as.Date(sales[1,1]), "%d/%m/%Y")
[1] "20/11/21"

How can I, at least, know what is happening with date format?

What should I do to subset the dataframe using code like:

subset(sales, format.Date(date, "%m")=="11" & format.Date(date, "%Y")=="2011")

Sorry if my question is not clear, but the problem I am facing is not clear to me either.

(Edit to correct formatting)

Upvotes: 0

Views: 620

Answers (2)

flodel
flodel

Reputation: 89057

Currently, what you think are dates really are just strings of characters. You need to turn them into Date objects using as.Date, and for that specify the format they are in (%d/%m/%Y) or R won't guess it for you.

sales <- data.frame(date   = c("21/11/2011", "21/11/2011", "23/11/2012", "24/11/2012"),
                    pieces = c(49,58,23,57,34),
                    income = c(220.5, 261, 103.5, 256.5, 112))
class(sales$date)
# [1] "factor"
sales$date <- as.Date(sales$date, "%d/%m/%Y")
class(sales$date)
# [1] "Date"
subset(sales, format.Date(date, "%m")=="11" & format.Date(date, "%Y")=="2011")
#         date pieces income
# 1 2011-11-21     49  220.5
# 2 2011-11-21     58  261.0

Upvotes: 1

user1234357
user1234357

Reputation: 319

Just to make the answer more general, I have added another month too.

The working data looks like:

           date pieces income
1 21/11/2011     49  220.5
2 22/11/2011     58  261.0
3 23/11/2011     23  103.5
4 24/11/2011     57  256.5
5 23/12/2011     50  240.0

There are many ways to do this. One I use regularly is strsplit and lapply.

sale$date1<-as.Date(sale$date, "%d/%m/%Y")  # let R know the date format

# Create a column of months by splitting the dates into 3 parts and grabbing the middle 
# part which is months
sale$months<-lapply(strsplit(as.character(sale$date1), "-"), function(x){x[2]}) 

# finally keep only the data for the month of November 

required<-subset(sale[which(sale$months==11),], select=-c(months,date1))

        date pieces income
1 21/11/2011     49  220.5
2 22/11/2011     58  261.0
3 23/11/2011     23  103.5
4 24/11/2011     57  256.5

Upvotes: 0

Related Questions