user3665359
user3665359

Reputation: 1

Creating a subset in R using date

I'm new in R and I need some help getting some things done. First of all I have to analyse a huge dataset 766K rows with 2 columns in the form below:

G40 2003-04-09
Z11 1997-08-15
K60 2006-03-16
I10 2000-11-30

The name of the dataset is Rdiagnosesand there is no header so by default Col1 is V1 and Col2 is V2. The first column is the diagnoses and the second the date which it was diagnosed. First I was thinking on creating a subset for each year separably. The way I'm try to do it is this way however it gives me an error.

diagnoses2009 <- as.Date( as.character(Rdiagnoses$V2), "%d-%m-%y")

Rdiagnoses_2009 <- subset(Rdiagnoses, V2 >= as.Date("2009-01-01") & V2 <= as.Date("2009-12-31") )

 Warning messages:

1: In eval(expr, envir, enclos) :
Incompatible methods ("Ops.factor", "Ops.Date") for ">="

2: In eval(expr, envir, enclos) :
Incompatible methods ("Ops.factor", "Ops.Date") for "<="

Any suggestions of correcting that of a better way of choosing each year is highly appreciated. Thank you in advance for your help!

Upvotes: 0

Views: 4316

Answers (1)

jlhoward
jlhoward

Reputation: 59345

So there are a couple of things going on here.

First, you (try to) set diagnoses2009 to a set of dates, but your subset expression does not use that variable at all.

Second, as @joran points out you are using the wrong format string: your dates are formatted as %Y-%m-%d. When you run as.Date(...) with an incorrect format string, you get NA for all the dates. So diagnoses2009 is a vector of NA.

Third, there are much better ways to split a dataframe. You could do this for example:

library(lubridate)
df.subsets <- split(df,year(as.Date(df$V2, "%Y-%m-%d")))

This creates a list of data frames, one for each year.

Finally, as @beginnerR points out, you didn't tell us anything about what you are planning to do with the split datasets. There might be a much better way to deal with your overall problem.

Upvotes: 1

Related Questions