Reputation: 1
I'm new in R and I need some help getting some things done. First of all I have to analyse a huge dataset 766K rows with 2 columns in the form below:
G40 2003-04-09
Z11 1997-08-15
K60 2006-03-16
I10 2000-11-30
The name of the dataset is Rdiagnoses
and there is no header so by default Col1 is V1
and Col2 is V2
. The first column is the diagnoses and the second the date which it was diagnosed.
First I was thinking on creating a subset for each year separably. The way I'm try to do it is this way however it gives me an error.
diagnoses2009 <- as.Date( as.character(Rdiagnoses$V2), "%d-%m-%y")
Rdiagnoses_2009 <- subset(Rdiagnoses, V2 >= as.Date("2009-01-01") & V2 <= as.Date("2009-12-31") )
Warning messages:
1: In eval(expr, envir, enclos) :
Incompatible methods ("Ops.factor", "Ops.Date") for ">="
2: In eval(expr, envir, enclos) :
Incompatible methods ("Ops.factor", "Ops.Date") for "<="
Any suggestions of correcting that of a better way of choosing each year is highly appreciated. Thank you in advance for your help!
Upvotes: 0
Views: 4316
Reputation: 59345
So there are a couple of things going on here.
First, you (try to) set diagnoses2009
to a set of dates, but your subset expression does not use that variable at all.
Second, as @joran points out you are using the wrong format string: your dates are formatted as %Y-%m-%d
. When you run as.Date(...)
with an incorrect format string, you get NA
for all the dates. So diagnoses2009
is a vector of NA
.
Third, there are much better ways to split a dataframe. You could do this for example:
library(lubridate)
df.subsets <- split(df,year(as.Date(df$V2, "%Y-%m-%d")))
This creates a list of data frames, one for each year.
Finally, as @beginnerR points out, you didn't tell us anything about what you are planning to do with the split datasets. There might be a much better way to deal with your overall problem.
Upvotes: 1