Reputation: 419
I have a data frame ordered by month and year. I want to select only the integer number of years i.e. if the data start in July 2002 and ends in September 2010 then select only data from July 2002 to June 2010. And if the data starts in September 1992 and ends in March 2000 then select only data from September 1992 to August 1999. Regardless of the missing months in between.
The data can be uploaded from the following link: enter link description here
The code
mydata <- read.csv("E:/mydata.csv", stringsAsFactors=TRUE)
this is manually selection
selected.data <- mydata[1:73,] # July 2002 to June 2010
how to achieve that by coding.
Upvotes: 0
Views: 277
Reputation: 388982
Here's a base R one-liner :
result <- mydata[seq_len(with(mydata, which(Month == month.name[match(Month[1],
month.name) - 1] & Year == max(Year)))), ]
head(result)
# Month Year var
#1 July 2002 -91.22997
#2 October 2002 -91.19007
#3 December 2002 -91.05395
#4 February 2003 -91.16958
#5 March 2003 -91.17881
#6 April 2003 -91.15110
tail(result)
# Month Year var
#68 December 2009 -90.92610
#69 January 2010 -91.07379
#70 February 2010 -91.12460
#71 March 2010 -91.10288
#72 April 2010 -91.06040
#73 June 2010 -90.94212
Upvotes: 1
Reputation: 2584
Here is a base solution, that reproduce your manual subsetting:
mydata <- read.csv("D:/mydata.csv", stringsAsFactors=F)
lookup <-
c(
January = 1,
February = 2,
March = 4,
April = 4,
May = 5,
June = 6,
July = 7,
August = 8,
September = 9,
October = 10,
November = 11,
December = 12
)
mydata$Month <- unlist(lapply(mydata$Month, function(x) lookup[match(x, names(lookup))]))
first.month <- mydata$Month[1]
last.year <- max(mydata$Year)
mydata[1:which(mydata$Month==(first.month -1)&mydata$Year==last.year),]
Basically, I convert the Month name in number and find the month preceding the first month that appears in the dataframe, for the last year of the dataframe.
Upvotes: 2