user3682294
user3682294

Reputation: 21

How do I subset data based on a date range in r?

I have a large .txt data file and I need to subset based on a date range.

head(newFile)
        Date     Time Global_active_power Global_reactive_power Voltage Global_intensity
1 16/12/2006 17:24:00               4.216                 0.418  234.84             18.4
2 16/12/2006 17:25:00               5.360                 0.436  233.63             23.0
3 16/12/2006 17:26:00               5.374                 0.498  233.29             23.0
4 16/12/2006 17:27:00               5.388                 0.502  233.74             23.0
5 16/12/2006 17:28:00               3.666                 0.528  235.68             15.8
6 16/12/2006 17:29:00               3.520                 0.522  235.02             15.0
  Sub_metering_1 Sub_metering_2 Sub_metering_3
1              0              1             17
2              0              1             16
3              0              2             17
4              0              1             17
5              0              1             17
6              0              2             17

I only need to use the data from the dates 2007-02-01 and 2007-02-02.

I think I would need to convert the Date and Time variables to Date/Time classes in R using strptime() and as.Date() functions, but I'm not clear on how to do that.

What is the simplest/cleanest way to do this?

Upvotes: 1

Views: 903

Answers (1)

rischan
rischan

Reputation: 1585

You can use lubridate library, this code is just example, I make a little change from your data

library(lubridate)

> df <- read.table("test2.txt", header=TRUE)
> df
        Date     Time Global_active_power Global_reactive_power Voltage
1 16/12/2006 17:24:00               4.216                 0.418  234.84
2 16/12/2006 17:25:00               5.360                 0.436  233.63
3 16/12/2007 17:26:00               5.374                 0.498  233.29
4 16/12/2007 17:27:00               5.388                 0.502  233.74
5 16/12/2006 17:28:00               3.666                 0.528  235.68
  Global_intensity
1             18.4
2             23.0
3             23.0
4             23.0
5             15.8
> date1 = dmy("04/06/2007")
> date2 = dmy("04/06/2009")
> with( df , df[ dmy(df$Date) >= date1 ,dmy(df$Date) <= date2 ] )
        Date     Time Global_active_power Global_reactive_power Voltage
3 16/12/2007 17:26:00               5.374                 0.498  233.29
4 16/12/2007 17:27:00               5.388                 0.502  233.74
  Global_intensity
3               23
4               23
> 

Upvotes: 3

Related Questions