Milaa
Milaa

Reputation: 419

select the data by month and years in R

I have a data frame ordered by month and year. I want to select only the integer number of years i.e. if the data start in July 2002 and ends in September 2010 then select only data from July 2002 to June 2010. And if the data starts in September 1992 and ends in March 2000 then select only data from September 1992 to August 1999. Regardless of the missing months in between.

The data can be uploaded from the following link: enter link description here

The code

mydata <- read.csv("E:/mydata.csv", stringsAsFactors=TRUE)

this is manually selection

selected.data <- mydata[1:73,]   # July 2002 to June 2010 

how to achieve that by coding.

Upvotes: 0

Views: 277

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

Here's a base R one-liner :

result <- mydata[seq_len(with(mydata, which(Month == month.name[match(Month[1],
                         month.name) - 1] & Year == max(Year)))), ]

head(result)

#     Month Year       var
#1     July 2002 -91.22997
#2  October 2002 -91.19007
#3 December 2002 -91.05395
#4 February 2003 -91.16958
#5    March 2003 -91.17881
#6    April 2003 -91.15110

tail(result)
#      Month Year       var
#68 December 2009 -90.92610
#69  January 2010 -91.07379
#70 February 2010 -91.12460
#71    March 2010 -91.10288
#72    April 2010 -91.06040
#73     June 2010 -90.94212 

Upvotes: 1

Elia
Elia

Reputation: 2584

Here is a base solution, that reproduce your manual subsetting:

mydata <- read.csv("D:/mydata.csv", stringsAsFactors=F)
lookup <-
  c(
    January = 1,
    February = 2,
    March = 4,
    April = 4,
    May = 5,
    June = 6,
    July = 7,
    August = 8,
    September = 9,
    October = 10,
    November = 11,
    December = 12
  )
mydata$Month <- unlist(lapply(mydata$Month, function(x) lookup[match(x, names(lookup))]))

first.month <- mydata$Month[1]
last.year <- max(mydata$Year)
mydata[1:which(mydata$Month==(first.month -1)&mydata$Year==last.year),]

Basically, I convert the Month name in number and find the month preceding the first month that appears in the dataframe, for the last year of the dataframe.

Upvotes: 2

Related Questions