Counting the number of changes between two dataframes (row by row) in r

Question

I have two data frames called (Before_data and After_data). Here's a sample from my data

# Before_data
P1  P2  P3  P4  P5  P6  P7  P8
90000   80000   90000   80000   60000   61399   NA  NA
80300   80000   80000   91903   30000   80300   NA  NA
30000   80300   30000   80300   39999   30701   39999   90900
90900   90000   90000   90000   NA  NA  NA  NA
80300   90900   80000   80000   80000   80000   80300   80300

# After_data
P1  P2  P3  P4  P5  P6  P7  P8
90000   80000   90000   80000   60000   61399   80300 80300
80300   80000   80000   91903   30000   80300   NA  NA
90000   90000   90000   NA  NA  NA  NA  NA
90000   100703  90000   99999   90300   100101  99999   31505
80300   80000   40101   90900   40101   40100   80000   80300

I would like to see the number of changes between every two rows (e.g between row 1 from Before_data and row 1 from After_data. The result will equal to 2).

The result is 0 if we compare row 2 in Before_data and row 2 in After_data.

I have tried the following

library(daff)
Before_data <-read.csv("Before_data .csv")
After_data<-read.csv("After_data.csv")

diff_data(Before_data, After_data)
dd <- diff_data(Before_data, After_data)
summary(dd)
write_diff(dd, "diff.csv")
render_diff(dd)

But this showed me the changes only not number!

Thanks,

Gregor Thomas · Accepted Answer

This should work:

rowSums(Before_data != After_data, na.rm = TRUE) +
  rowSums(is.na(Before_data) & !is.na(After_data)) +
  rowSums(!is.na(Before_data) & is.na(After_data))

It's easy to tell when non-NA values have changed, we can use !=. We have to be a little more careful with NA because NA != NA will give NA.

Counting the number of changes between two dataframes (row by row) in r

Answers (2)

Related Questions