Reputation: 13
This seems like it can be a quick fix, but I have been stumped for a while now.
In my data frame, I have a date column. However, the date column is presented with a Year in one row, then all the months of that year in the next few rows in the same column, then the next year, then the months of that year, so on. Something that looks like this:
c("2011", "7", "8", "9", "10", "11", "12", "2012", "1", "2")
My question is, how do I just append that year to the end of each month? The other main issue is I am going to be automating this for a lot of different data frames that start at different months and years. So for example, one data frame might start on june of 2013, while another starts at september 2012.
I have thought about this for quite some time, and have come up with ideas such as: For each row, look at the 12 preceding rows (for how many months in a year) and find that element with a string length of 4 (for "2012" or other year; each month has a string length of 3 by the way). However, this fails because what if the data starts on say, October 2011. Then we only get 3 months for that first year and then when I run that algorithm it will go into negative values for rows and that will cause errors left and right through R.
I would really appreciate help in this. I can't seem to crack this out for the large number of data frames I need to work this through.
Upvotes: 1
Views: 134
Reputation: 269556
Use na.locf
from the zoo package to get the year for each row, year
, and then convert to "yearmon"
class (also in zoo). Finally omit the year (or not depending on what you want) and optionally format the year/month as desired.
library(zoo)
# input
dat <- c("2011", "7", "8", "9", "10", "11", "12", "2012", "1", "2")
year <- na.locf(ifelse(nchar(dat) == 4, dat, NA))
ym <- as.yearmon(paste(year, dat, sep = "-"))
and then try one of these depending on what it is you want:
na.omit(ym)
## [1] "Jul 2011" "Aug 2011" "Sep 2011" "Oct 2011" "Nov 2011" "Dec 2011" "Jan 2012" "Feb 2012"
format(na.omit(ym), "%Y-%m")
## [1] "2011-07" "2011-08" "2011-09" "2011-10" "2011-11" "2011-12" "2012-01" "2012-02"
ifelse(is.na(ym), year, format(ym))
## [1] "2011" "Jul 2011" "Aug 2011" "Sep 2011" "Oct 2011" "Nov 2011" "Dec 2011" "2012" "Jan 2012" "Feb 2012"
ifelse(is.na(ym), year, format(ym, "%Y-%m"))
## [1] "2011" "2011-07" "2011-08" "2011-09" "2011-10" "2011-11" "2011-12" "2012" "2012-01" "2012-02"
See the zoo documentation for more information on na.locf
and the "yearmon"
class.
Upvotes: 1
Reputation: 93813
Identify each group as a sequential counter, then replace all the values as requested:
ave(
x,
cumsum(grepl("\\d{4}", x)),
FUN=function(v) replace(v, -1, paste(v[-1],v[1],sep="-"))
)
# [1] "2011" "7-2011" "8-2011" "9-2011" "10-2011"
# [6] "11-2011" "12-2011" "2012" "1-2012" "2-2012"
Upvotes: 1