Reputation: 61
I have a column of dates from which I am trying to create a list of years for each row. For example, this is a few rows of my data:
1997-2001
1994
2007-2009; 2013-2015; 2016
2007-2008; 2014
For example, for the first row I want a list containing: 1997, 1998, 1999, 2000 and 2001. For the second row I want a list containing just 1994. For the 3rd row I want a list containing: 2007, 2008, 2009, 2013, 2014, 2015, and 2016. and so on like this. Is there a way to do this?
Upvotes: 1
Views: 588
Reputation: 391
bgoldst's answer resolved the problem but here's another way you could do it.
You can use gsub
to convert your semicolons to commas and dashes to colons like so (where df is the data frame and x is the column containing the data):
df$x<-gsub("-",":",df$x)
df$x<-gsub(";",",",df$x)
which would give you:
1997:2001
1994
2007:2009, 2013-2015, 2016
2007:2008, 2014
Then use a for-loop to evaluate all those expressions:
years<-list()
for(i in 1:nrow(df)){
years[i]<-list(eval(parse(text=paste("c(",df$x[i],")"))))
}
As above, if your input is a vector of factors rather than characters, you will need to replace df$x[i]
with as.character(df$x[i])
.
Upvotes: 2
Reputation: 35314
It's ugly, but it gets the job done:
lapply(strsplit(df$date,'\\s*;\\s*'),function(x) unlist(lapply(strsplit(x,'-'),function(y) { z <- as.integer(y); if (length(z)==1L) z else z[1L]:z[2L]; })));
## [[1]]
## [1] 1997 1998 1999 2000 2001
##
## [[2]]
## [1] 1994
##
## [[3]]
## [1] 2007 2008 2009 2013 2014 2015 2016
##
## [[4]]
## [1] 2007 2008 2014
##
Data
df <- data.frame(date=c('1997-2001','1994','2007-2009; 2013-2015; 2016','2007-2008; 2014'),
stringsAsFactors=F);
Note: If your input vector is a factor, as opposed to a character vector, then you'll have to wrap it in as.character()
before passing it to the initial strsplit()
call.
Upvotes: 4