Reputation: 1203
I would like to log changes in a data.frame:
In the original
dataframe I have the following structure:
library(dplyr)
library(compare)
origianl <- data_frame( name = c('John','Tim','Allan','Mitch'),
A = c(0,1,1,0),
B = c(1,0,0,1),
C = c(0,0,0,0),
D = c(1,0,0,1))
#> data.frame(origianl)
name A B C D
1 John 0 1 0 1
2 Tim 1 0 0 0
3 Allan 1 0 0 0
4 Mitch 0 1 0 1
Here for instance John plays for B and D.
In the changed
dataframe I have the following structure:
changed <- data_frame( name = c('John','Tim','Allan','Mitch'),
A = c(1,0,1,0),
B = c(0,0,0,0),
C = c(0,1,0,1),
D = c(1,0,0,1))
#> data.frame(changed)
name A B C D
1 John 1 0 0 1
2 Tim 0 0 1 0
3 Allan 1 0 0 0
4 Mitch 0 0 1 1
Here for instance John changed teams and now plays for A and D.
There are couple of constraints:
I check original
on daily basis. Player may play for multiple teams or for one and they can switch teams. They are assigned to at least one team.
What I have tried is to write a function which logs the changes in two specific columns when
from
The desired output table may look like this, which is basically the changed
with two additional columns logging the change:
#> data.frame(new_original)
name A B C D from when
1 John 1 0 0 1 B 2017-01-01
2 Tim 0 0 1 0 A 2017-01-01
3 Allan 1 0 0 0 NA NA
4 Mitch 0 0 1 1 B 2017-01-01
What I have tried so far:
my_func <- function(origianl, changed) {
# Get the difference
difference <- anti_join(changed, origianl)
# find the changes in the original data
differ <- origianl[origianl$name %in% difference$name, ]
# check whether condition is satisfyied
if (nrow(differ) == nrow(difference) &&
nrow(anti_join(changed, origianl)) > 0)
{
new_original <-
cbind(changed, from = names(which(
compare(difference[2:ncol(difference)], differ[2:ncol(differ)], allowAll =
TRUE)$detailedResult == FALSE, when = Sys.Date()
)))
}
return(new_original)
}
new_original = my_func(origianl, changed)
Which results in arguments imply differing number of rows: 4, 3
from the cbind commmand. Not sure how to get to the ideal output
Upvotes: 2
Views: 115
Reputation: 92282
Here's a possible vectorized approach using base R. You haven't specified how is when
column calculated (is this just the current date?) but here is how from
could be calculated pretty easily
indx <- which((origianl[-1] - changed[-1]) == 1, arr.ind = TRUE)
changed[indx[, "row"], "from"] <- names(changed)[-1][indx[, "col"]]
changed
# name A B C D from
# 1 John 1 0 0 1 B
# 2 Tim 0 0 1 0 A
# 3 Allan 1 0 0 0 <NA>
# 4 Mitch 0 0 1 1 B
Upvotes: 4
Reputation: 51582
Assuming that when
is picked up by Sys.time()
, then,
library(dplyr)
f1 <- function(x, y){
d4 <-left_join(y, x, by = 'name')
d5 <- d4[grepl('.x', names(d4), fixed = TRUE)]
d6 <- d4[grepl('.y', names(d4), fixed = TRUE)]
l1 <- apply(d5 - d6, 1, function(i) names(i)[i == -1])
y$from <- sub('\\..*', '', unlist(ifelse(lengths(l1) == 0, NA, l1)))
y$when <- Sys.time()
y$when[is.na(y$from)] <- NA
return(y)
}
f1(origianl, changed)
# name A B C D from when
#1 John 1 0 0 1 B 2017-03-14 15:37:59
#2 Tim 0 0 1 0 A 2017-03-14 15:37:59
#3 Allan 1 0 0 0 <NA> <NA>
#4 Mitch 0 0 1 1 B 2017-03-14 15:37:59
Upvotes: 1
Reputation: 1338
I am not sure how you get the when, but this should work for from:
origianl <- as.data.frame(origianl)
id <- sapply(1:nrow(origianl), function(x) i[which( origianl[x,]==1)][!(i[which( origianl[x,]==1)] %in% i[which(changed[x,]==1)])])
origianl$from <- sapply(id,function(x) ifelse(length(x)==0,NA,x))
Upvotes: 0