Mamba
Mamba

Reputation: 1203

How to determine changes in a dataframe?

I would like to log changes in a data.frame:

In the original dataframe I have the following structure:

library(dplyr)
library(compare)

origianl <- data_frame( name = c('John','Tim','Allan','Mitch'),

                          A = c(0,1,1,0),

                          B = c(1,0,0,1),

                          C = c(0,0,0,0),

                          D = c(1,0,0,1))

#> data.frame(origianl)

   name           A           B           C           D
1  John           0           1           0           1
2   Tim           1           0           0           0
3 Allan           1           0           0           0
4 Mitch           0           1           0           1

Here for instance John plays for B and D.

In the changed dataframe I have the following structure:

changed <- data_frame( name = c('John','Tim','Allan','Mitch'),

                                A = c(1,0,1,0),

                                B = c(0,0,0,0),

                                C = c(0,1,0,1),

                                D = c(1,0,0,1))

#> data.frame(changed)
   name           A           B           C           D
1  John           1           0           0           1
2   Tim           0           0           1           0
3 Allan           1           0           0           0
4 Mitch           0           0           1           1

Here for instance John changed teams and now plays for A and D.

There are couple of constraints:

I check original on daily basis. Player may play for multiple teams or for one and they can switch teams. They are assigned to at least one team.

What I have tried is to write a function which logs the changes in two specific columns when from

The desired output table may look like this, which is basically the changed with two additional columns logging the change:

#> data.frame(new_original)
   name           A           B           C           D  from    when
1  John           1           0           0           1   B     2017-01-01
2   Tim           0           0           1           0   A     2017-01-01
3 Allan           1           0           0           0   NA     NA
4 Mitch           0           0           1           1   B     2017-01-01

What I have tried so far:

my_func <- function(origianl, changed) {
# Get the difference
 difference <- anti_join(changed, origianl)
# find the changes in the original data
  differ <- origianl[origianl$name %in% difference$name, ]

# check whether condition is satisfyied
if (nrow(differ) == nrow(difference) &&
  nrow(anti_join(changed, origianl)) > 0)

{
 new_original <-
  cbind(changed, from = names(which(
    compare(difference[2:ncol(difference)], differ[2:ncol(differ)], allowAll =
              TRUE)$detailedResult == FALSE, when = Sys.Date()
  )))

 }

 return(new_original)
}

new_original = my_func(origianl, changed)

Which results in arguments imply differing number of rows: 4, 3 from the cbind commmand. Not sure how to get to the ideal output

Upvotes: 2

Views: 115

Answers (3)

David Arenburg
David Arenburg

Reputation: 92282

Here's a possible vectorized approach using base R. You haven't specified how is when column calculated (is this just the current date?) but here is how from could be calculated pretty easily

indx <- which((origianl[-1] - changed[-1]) == 1, arr.ind = TRUE)
changed[indx[, "row"], "from"] <- names(changed)[-1][indx[, "col"]]
changed
#    name A B C D from
# 1  John 1 0 0 1    B
# 2   Tim 0 0 1 0    A
# 3 Allan 1 0 0 0 <NA>
# 4 Mitch 0 0 1 1    B

Upvotes: 4

Sotos
Sotos

Reputation: 51582

Assuming that when is picked up by Sys.time(), then,

library(dplyr)

f1 <- function(x, y){
  d4 <-left_join(y, x, by = 'name')
  d5 <- d4[grepl('.x', names(d4), fixed = TRUE)]
  d6 <- d4[grepl('.y', names(d4), fixed = TRUE)]
  l1 <- apply(d5 - d6, 1, function(i) names(i)[i == -1])
  y$from <- sub('\\..*', '', unlist(ifelse(lengths(l1) == 0, NA, l1)))
  y$when <- Sys.time()
  y$when[is.na(y$from)] <- NA
  return(y)
}

f1(origianl, changed)

#   name A B C D from                when
#1  John 1 0 0 1    B 2017-03-14 15:37:59
#2   Tim 0 0 1 0    A 2017-03-14 15:37:59
#3 Allan 1 0 0 0 <NA>                <NA>
#4 Mitch 0 0 1 1    B 2017-03-14 15:37:59

Upvotes: 1

count
count

Reputation: 1338

I am not sure how you get the when, but this should work for from:

origianl <- as.data.frame(origianl)
id <- sapply(1:nrow(origianl), function(x) i[which( origianl[x,]==1)][!(i[which( origianl[x,]==1)] %in% i[which(changed[x,]==1)])])
origianl$from <- sapply(id,function(x) ifelse(length(x)==0,NA,x))

Upvotes: 0

Related Questions