Reputation: 25
I have two data frames called (Before_data and After_data). Here's a sample from my data
# Before_data
P1 P2 P3 P4 P5 P6 P7 P8
90000 80000 90000 80000 60000 61399 NA NA
80300 80000 80000 91903 30000 80300 NA NA
30000 80300 30000 80300 39999 30701 39999 90900
90900 90000 90000 90000 NA NA NA NA
80300 90900 80000 80000 80000 80000 80300 80300
# After_data
P1 P2 P3 P4 P5 P6 P7 P8
90000 80000 90000 80000 60000 61399 80300 80300
80300 80000 80000 91903 30000 80300 NA NA
90000 90000 90000 NA NA NA NA NA
90000 100703 90000 99999 90300 100101 99999 31505
80300 80000 40101 90900 40101 40100 80000 80300
I would like to see the number of changes between every two rows (e.g between row 1 from Before_data and row 1 from After_data. The result will equal to 2).
The result is 0 if we compare row 2 in Before_data and row 2 in After_data.
I have tried the following
library(daff)
Before_data <-read.csv("Before_data .csv")
After_data<-read.csv("After_data.csv")
diff_data(Before_data, After_data)
dd <- diff_data(Before_data, After_data)
summary(dd)
write_diff(dd, "diff.csv")
render_diff(dd)
But this showed me the changes only not number!
Thanks,
Upvotes: 2
Views: 54
Reputation: 520948
A slight variation on the answer by @Gregor :
ncol(Before_data) - rowSums(Before_data == After_data | is.na(Before_data) & is.na(After_data))
Upvotes: 2
Reputation: 145755
This should work:
rowSums(Before_data != After_data, na.rm = TRUE) +
rowSums(is.na(Before_data) & !is.na(After_data)) +
rowSums(!is.na(Before_data) & is.na(After_data))
It's easy to tell when non-NA values have changed, we can use !=
. We have to be a little more careful with NA
because NA != NA
will give NA
.
Upvotes: 2