Reputation: 118
So I have 2 data frames df1
and df2
, with two columns Curr_Time
and Curr_Date
in both the data frames. I should compare the values of Curr_Time
in both the data frames, if the values are same then do nothing else if the values are different then append the new values the df1
.
I am dealing with streaming data where df2
has only one row with the latest value. My aim is to append the new values in the df2
to df1
if and only if the values in df2$Curr_Time != df1$Curr_Time
. Currently, I am able to append all the values to the df1
irrespective of the above-mentioned logic.
df2
: This has the only one row gets updated for every 5 seconds
Curr_Time Curr_Date
11:45:34 10-04-2018
df1
: Currently new row is appended for every 5 Seconds
without validating the values which result in redundancy of the values.
Curr_Time Curr_Date
11:43:34 10-04-2018
11:43:34 10-04-2018
11:45:34 10-04-2018
11:45:34 10-04-2018
Expected Output of df1
Curr_Time Curr_Date
11:43:34 10-04-2018
11:45:34 10-04-2018
Below is my R Code.
library(tcltk2)
df1 <- data.frame(stringsAsFactors=FALSE)
df2 <- data.frame(stringsAsFactors=FALSE)
frameupdate <- function(){
if (nrow(df1)==0)
df1 <<- df2
else
df1 <<- rbind(df1 , df2)
}
tclTaskSchedule(5000, frameupdate(), id = "frameupdate", redo = TRUE)
Upvotes: 0
Views: 89
Reputation: 134
As @cephalopod mentioned, anti_join
is a good way here.
You want to check that if the record in df2
is already included in df1
already.
You can do as @Stephan mentioned, after you append everything without checking if it's duplicated, run a distinct()
to get distinct records
Or you can check every-time in your function, or use dplyr's anti_join
function.
Here is the example for dplyr:
First I assume df1
should not contain duplicated record (if logic was right from the very start)
df1<-df1 %>% unique()
head(df1)
Curr_Time Curr_Date
1 11:43:34 10-04-2018
3 11:45:34 10-04-2018
I created another record df2.new
as an example for a new record that should be appended to df1
:
df2.new
Curr_Time Curr_Date
1 11:45:57 10-04-2018
For example:
df2.new %>% anti_join(df1)
Joining, by = c("Curr_Time", "Curr_Date")
Curr_Time Curr_Date
1 11:45:57 10-04-2018
df2 %>% anti_join(df1)
Joining, by = c("Curr_Time", "Curr_Date")
[1] Curr_Time Curr_Date
<0 rows> (or 0-length row.names)
It would work even if your df1
was empty, therefore you can update your function like this:
frameupdate<-function(){
df1<<-rbind(df1, anti_join(df2,df1))
}
Or you could get something like this
frameupdate <- function(){
if (nrow(df1[df1$Curr_Time==df2$Curr_Time & df1$Curr_Date==df2$Curr_Date,])==0)
df1 <<- rbind(df1 , df2)
}
frameupdate()
Running this function would get the expected output, even when df1
is empty.
df1
Curr_Time Curr_Date
1 11:43:34 10-04-2018
2 11:45:34 10-04-2018
3 11:45:57 10-04-2018
Upvotes: 1
Reputation: 2246
After your if else
statement you can follow with a simple validation:
library(dplyr)
df1 %>%
distinct()
which gives you:
# A tibble: 2 x 2
Curr_Time Curr_Date
<time> <chr>
1 11:43 10-04-2018
2 11:45 10-04-2018
Upvotes: 1