Christine
Christine

Reputation: 31

Alter values in one data frame based on comparison values in another in R

I am trying to subtract one hour to date/times within a POSIXct column that are earlier than or equal to a time stated in a different comparison dataframe for that particular ID.

For example:

#create sample data
Time<-as.POSIXct(c("2015-10-02 08:00:00","2015-11-02 11:00:00","2015-10-11 10:00:00","2015-11-11 09:00:00","2015-10-24 08:00:00","2015-10-27 08:00:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,01,02,02,03,03)
data<-data.frame(Time,ID)

Which produces this:

                 Time ID
1 2015-10-02 08:00:00  1
2 2015-11-02 11:00:00  1
3 2015-10-11 10:00:00  2
4 2015-11-11 09:00:00  2
5 2015-10-24 08:00:00  3
6 2015-10-27 08:00:00  3

I then have another dataframe with a key date and time for each ID to compare against. The Time in data should be compared against Comparison in ComparisonData for the particular ID it is associated with. If the Time value in data is earlier than or equal to the comparison value one hour should be subtracted from the value in data:

#create sample comparison data
Comparison<-as.POSIXct(c("2015-10-29 08:00:00","2015-11-02 08:00:00","2015-10-26 08:30:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,02,03)
ComparisonData<-data.frame(Comparison,ID)

This should look like this:

           Comparison  ID
1 2015-10-29 08:00:00   1
2 2015-11-02 08:00:00   2
3 2015-10-26 08:30:00   3

In summary, the code should check all times of a certain ID to see if any are earlier than or equal to the value specified in ComparisonData and if they are, subtract one hour. This should give this data frame as an output:

                 Time  ID
1 2015-10-02 07:00:00   1
2 2015-11-02 11:00:00   1
3 2015-10-11 09:00:00   2
4 2015-11-11 09:00:00   2
5 2015-10-24 07:00:00   3
6 2015-10-27 08:00:00   3

I have looked at similar solutions such as this but I cannot understand how to also check the times using the right timing with that particular ID.

I think ddply seems quite a promising option but I'm not sure how to use it for this particular problem.

Upvotes: 1

Views: 69

Answers (2)

David Arenburg
David Arenburg

Reputation: 92292

Here's a quick and efficient solution using data.table. First we join the two data sets by ID and then just modify the Times which are lower or equal to Comparison

library(data.table) # v1.9.6+
setDT(data)[ComparisonData, end := i.Comparison, on = "ID"]
data[Time <= end, Time := Time - 3600L][, end := NULL]
data
#                   Time ID
# 1: 2015-10-02 07:00:00  1
# 2: 2015-11-02 11:00:00  1
# 3: 2015-10-11 09:00:00  2
# 4: 2015-11-11 09:00:00  2
# 5: 2015-10-24 07:00:00  3
# 6: 2015-10-27 08:00:00  3

Alternatively, we could do this in one step while joining using ifelse (not sure how efficient this though)

setDT(data)[ComparisonData, 
            Time := ifelse(Time <= i.Comparison, 
                           Time - 3600L, Time), 
            on = "ID"]
data
#                   Time ID
# 1: 2015-10-02 07:00:00  1
# 2: 2015-11-02 11:00:00  1
# 3: 2015-10-11 09:00:00  2
# 4: 2015-11-11 09:00:00  2
# 5: 2015-10-24 07:00:00  3
# 6: 2015-10-27 08:00:00  3

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

I am sure there is going to be a better solution than this, however, I think this works.

for(i in 1:nrow(data)) {
  if(data$Time[i] < ComparisonData[data$ID[i], 1]){
   data$Time[i] <- data$Time[i] - 3600
  }
}



#          Time ID
#1 2015-10-02 07:00:00  1
#2 2015-11-02 11:00:00  1
#3 2015-10-11 09:00:00  2
#4 2015-11-11 09:00:00  2
#5 2015-10-24 07:00:00  3
#6 2015-10-27 08:00:00  3

This is going to iterate through every row in data.

ComparisonData[data$ID[i], 1] gets the time column in ComparisonData for the corresponding ID. If this is greater than the Time column in data then reduce the time by 1 hour.

Upvotes: 0

Related Questions