Reputation: 749
I have 12 columns:
11 are "day0", "day1" ... "day10" from in a df called "forecasting"
1 is "within10closures" in a df called "headcount"
Both dfs are of the same length.
I want to subtract headcount$within10closures from forecasting$day10 unless it hits 0, in which case I want it to subtract the remainder from day9 and so on until it gets to day0.
at the moment I have the code below which - whilst long and probably not the most efficient way - seems to work but I want to do this same procedure with multiple sets of columns the vary in number. Hopefully it explains what i'm trying to do.
for(i in 1:i) if(forecasting$day10[i]>headcount$within10closures[i]){
forecasting$day10[i] <- forecasting$day10[i] - headcount$within10closures[i]
} else {
forecasting$day10[i] <- 0
subtraction <- headcount$within10closures[i]-forecasting$day10[i]
if(forecasting$day9[i]>subtraction){
forecasting$day9[i] - (subtraction)
} else {
forecasting$day9[i] <- 0
subtraction <- subtraction - forecasting$day9[i]
if(forecasting$day8[i]>subtraction){
forecasting$day8[i] - (subtraction)
} else {
forecasting$day8[i] <- 0
subtraction <- subtraction - forecasting$day8[i]
if(forecasting$day7[i]>subtraction){
forecasting$day7[i] - (subtraction)
} else {
forecasting$day7[i] <- 0
subtraction <- subtraction - forecasting$day7[i]
if(forecasting$day6[i]>subtraction){
forecasting$day6[i] - (subtraction)
} else {
forecasting$day6[i] <- 0
subtraction <- subtraction - forecasting$day6[i]
if(forecasting$day5[i]>subtraction){
forecasting$day5[i] - (subtraction)
} else {
forecasting$day5[i] <- 0
subtraction <- subtraction - forecasting$day5[i]
if(forecasting$day4[i]>subtraction){
forecasting$day4[i] - (subtraction)
} else {
forecasting$day4[i] <- 0
subtraction <- subtraction - forecasting$day4[i]
if(forecasting$day3[i]>subtraction){
forecasting$day3[i] - (subtraction)
} else {
forecasting$day3[i] <- 0
subtraction <- subtraction - forecasting$day3[i]
if(forecasting$day2[i]>subtraction){
forecasting$day2[i] - (subtraction)
} else {
forecasting$day2[i] <- 0
subtraction <- subtraction - forecasting$day2[i]
if(forecasting$day1[i]>subtraction){
forecasting$day1[i] - (subtraction)
} else {
forecasting$day2[i] <- 0
subtraction <- subtraction - forecasting$day2[i]
forecasting$day1 <- forecasting$day1 - subtraction
}
}
}
}
}
}
}
}
}
}
I have tried to create a for loop to do the same thing but where i can control which column ranges to to the subtraction on but I don't really know what i'm doing and it doesn't get the right numbers at all (apologies for the messy code):
for(x in 1:11){
for(i in 1:n){
ifelse(x==1, subtraction <- headcount$within10closures[i], subtraction <- (subtraction - (forecasting[i,grep("day10", colnames(forecasting))+1-x])))
ifelse(forecasting[i,grep("day10", colnames(forecasting))+1-x]>=subtraction, forecasting[i,grep("day10", colnames(forecasting))+1-x] <- forecasting[i,grep("day10", colnames(forecasting))+1-x] - subtraction, forecasting[i,grep("day10", colnames(forecasting))+1-x] <- 0)
}
}
Essentially what I'm asking is how do I efficiently cumulatively subtract between columns based on another column's value whilst controlling for the number of columns to subtract from?
Simplified input:
forecasting: (values could be any number)
day0 | day1 | day2 | day3
-----+------+------+------
1 | 2 | 4 | 18
10 | 10 | 10 | 10
7 | 10 | 10 | 10
headcount: (values could be any number)
| within10closures |
6
10
35
Desired outcome:
day0 | day1 | day2 | day3
-----+------+------+------
1 | 2 | 4 | 12
10 | 10 | 10 | 0
2 | 0 | 0 | 0
data
forecasting <- data.frame(matrix(rep(10, 12), nrow = 3))
colnames(forcasting) <- paste0("day", 0:3)
headcount <- data.frame(within10closures = c(6, 10, 35))
Edit: forecasting dataframe is not always all 10, consider below as example input:
set.seed(1)
forecasting <- data.frame(matrix(sample(1:10, 12, replace = TRUE), nrow = 3))
colnames(forecasting) <- paste0("day", 0:3)
# day0 day1 day2 day3
# 1 3 10 10 1
# 2 4 3 7 3
# 3 6 9 7 2
Upvotes: 1
Views: 585
Reputation: 15784
Reviewed after update:
datas:
set.seed(1234)
forecasting <- data.frame(
day0=sample(1:10,4),
day1=sample(1:10,4),
day2=sample(1:10,4),
day3=sample(1:10,4),
day4=sample(1:10,4)
)
headcount <- data.frame(within10closures=c(6,10,25,12))
> print(forecasting)
day0 day1 day2 day3 day4
1 2 9 7 3 3
2 6 6 5 9 10
3 5 1 6 10 2
4 8 2 4 6 8
Code:
for (i in 1:length(headcount$within10closures)) {
v=headcount$within10closures[i]
tmp <- c()
if (sum(forecasting[i,]) - v < 0) {
forecasting[i,] <- c(sum(forecasting[i,]) - v,rep(0,ncol(forecasting) - 1))
} else {
for (x in rev(forecasting[i,])) {
tmp <- c(tmp, ifelse(x - v < 0, 0, x - v ))
v <- ifelse(v - x < 1, 0, v - x)
}
forecasting[i,] <- rev(tmp)
}
}
Basically a loop on the values to substract, if the value is greater than the row, build the row with a negative value as first element.
Else loop over the corresponding row reversed (rev
) and do the difference, set to 0 if there's more to substract than the current value.
Then remove the value from what's is to remove, set it to 0 if it gets under 1 (0 or negative).
Finally reverse this built vector (tmp
) and set in to replace original forecasting row.
This gives:
> forecasting
day0 day1 day2 day3 day4
1 2 9 7 0 0
2 6 6 5 9 0
3 -1 0 0 0 0
4 8 2 4 2 0
Previous answer:
This seems to get what you're after , not handling negative number in this primary version, second version below:
forecasting <- data.frame(day0=rep(10,3),day1=rep(10,3),day2=rep(10,3),day3=rep(10,3))
headcount <- data.frame(within10closures=c(6,10,35))
nb <- rowSums(forecasting)-headcount$within10closures
result <- as.data.frame(t(sapply(nb, function(x) {
c(
rep(10,x%/%10),
ifelse(10-x%%10==10,0,10-x%%10),
rep(0 , ( ncol(forecasting) - x%/%10 -1) )
)
}
)))
colnames(result) <- paste0("day",1:ncol(forecasting))
first I compute the sums for each row, and I substract the corresponding within10closures
value.
Now for each value (the sapply
loop), I get the number of columns with full 10 (the plain division x%/%10
), the remainder part and the number of columns at 0 to complete the row.
For the remainder (x%%10
) we have two case, when it worth 0, wwe wish to display 0 and not 10, so there's this ifelse
within the vector construction to set 0 if we get 0, or 10 minus the remainder if positive.
This give use a matrix like this:
[,1] [,2] [,3]
[1,] 10 10 5
[2,] 10 10 0
[3,] 10 10 0
[4,] 6 0 0
To make it back into a data.frame we need to transpose it t( )
and coerce it to data.frame with as.data.frame
, and final touch is naming the columns with day1 to ncol(forecasting)
(could be from 0 to ncol(forcasting)-1 if needed with 0:(ncol(forecasting)-1)
, pay attention to the parentheses here to get the proper values, range is expended before the substraction happens).
To handle negative number we need some more conditions as -5%%10
returns 5:
result <- as.data.frame(t(sapply(nb, function(x) {
c(
rep(10,ifelse(x%/%10>-1, x%/%10, 0) ),
ifelse(x%%10==0,0,ifelse(x >0, 10 - x%%10, x) ),
rep(0 , ( ncol(forecasting) - ifelse( x%/%10 > -1, x%/%10, 0) - 1 ) )
)
}
)))
Here there's an addition of ifelse on the primary division to get 0 or a positive number and drop negative numbers.
One more ifelse
on the remainder to use it only if positive, if not we keep the negative value even if it's greater than 10. I.e: you may end up with a row like -35 0 0 0
Upvotes: 5