Learner
Learner

Reputation: 118

Validate column values in two different data frame and append mismatched values to existing data frame

So I have 2 data frames df1 and df2, with two columns Curr_Time and Curr_Date in both the data frames. I should compare the values of Curr_Time in both the data frames, if the values are same then do nothing else if the values are different then append the new values the df1.

I am dealing with streaming data where df2 has only one row with the latest value. My aim is to append the new values in the df2 to df1 if and only if the values in df2$Curr_Time != df1$Curr_Time. Currently, I am able to append all the values to the df1 irrespective of the above-mentioned logic.

df2: This has the only one row gets updated for every 5 seconds

     Curr_Time        Curr_Date
     11:45:34         10-04-2018

df1: Currently new row is appended for every 5 Secondswithout validating the values which result in redundancy of the values.

    Curr_Time         Curr_Date
    11:43:34         10-04-2018
    11:43:34         10-04-2018
    11:45:34         10-04-2018
    11:45:34         10-04-2018 

Expected Output of df1

    Curr_Time       Curr_Date  
    11:43:34        10-04-2018
    11:45:34        10-04-2018

Below is my R Code.

    library(tcltk2)

    df1 <- data.frame(stringsAsFactors=FALSE)
    df2 <- data.frame(stringsAsFactors=FALSE)

    frameupdate <- function(){
    if (nrow(df1)==0)
     df1 <<- df2
   else
     df1 <<- rbind(df1 , df2)
   }

      tclTaskSchedule(5000, frameupdate(), id = "frameupdate", redo = TRUE)

Upvotes: 0

Views: 89

Answers (2)

SunLisa
SunLisa

Reputation: 134

As @cephalopod mentioned, anti_join is a good way here.

You want to check that if the record in df2 is already included in df1 already.

You can do as @Stephan mentioned, after you append everything without checking if it's duplicated, run a distinct() to get distinct records

Or you can check every-time in your function, or use dplyr's anti_join function.

Here is the example for dplyr:

First I assume df1 should not contain duplicated record (if logic was right from the very start)

df1<-df1 %>% unique()
head(df1)
  Curr_Time  Curr_Date
1  11:43:34 10-04-2018
3  11:45:34 10-04-2018

I created another record df2.new as an example for a new record that should be appended to df1:

df2.new
  Curr_Time  Curr_Date
1  11:45:57 10-04-2018

For example:

df2.new %>% anti_join(df1)
Joining, by = c("Curr_Time", "Curr_Date")
  Curr_Time  Curr_Date
1  11:45:57 10-04-2018

df2 %>% anti_join(df1)
Joining, by = c("Curr_Time", "Curr_Date")
[1] Curr_Time Curr_Date
<0 rows> (or 0-length row.names)

It would work even if your df1 was empty, therefore you can update your function like this:

frameupdate<-function(){
df1<<-rbind(df1, anti_join(df2,df1))
}

Or you could get something like this

frameupdate <- function(){
if (nrow(df1[df1$Curr_Time==df2$Curr_Time & df1$Curr_Date==df2$Curr_Date,])==0)
    df1 <<- rbind(df1 , df2)
  }

frameupdate()

Running this function would get the expected output, even when df1 is empty.

df1
  Curr_Time  Curr_Date
1  11:43:34 10-04-2018
2  11:45:34 10-04-2018
3  11:45:57 10-04-2018

Upvotes: 1

Stephan
Stephan

Reputation: 2246

After your if else statement you can follow with a simple validation:

library(dplyr)
df1 %>%
  distinct()

which gives you:

# A tibble: 2 x 2
  Curr_Time Curr_Date 
   <time>    <chr>     
1 11:43     10-04-2018
2 11:45     10-04-2018

Upvotes: 1

Related Questions