Reputation: 81
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
13 0 14 0 16 0 22 0 20 0 18 0
30 0 30 0 0 0 0 0 0 0 30 0
0 0 29 33 0 48 0 49 0 50 0 33
0 45 30 0 0 55 0 69 55 0 0 40
0 54 0 45 0 48 0 73 0 46 0 36
16 0 15 0 13 0 16 0 24 0 23 0
0 32 26 0 0 57 0 65 49 0 0 32
6 0 6 0 5 0 7 0 6 0 6 0
0 665 310 271 0 646 0 706 0 585 0 516
0 70 41 0 0 101 0 112 112 0 0 90
I currently have this sparse data frame in which I want to replace all the 0's with half of the value to the right and replace zero's in December with half of January's value.
For example the first row would should look like:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
6.5 7 7 8 8 11 11 10 10 9 9 6.5
So essentially the total for the year should remained unchanged and should just be spread across the months more evenly.
I have tried to use loops however my actual data spans multiple years and hundreds of thousand rows and loops are much too slow. I know apply is the way to go but can't figure out what my function should be or how to apply it to rows and columns.
Thanks
Upvotes: 3
Views: 221
Reputation: 28825
I made dfout
to leave the original df
intact. dfv2
gives us what is the value that should be put in the dfout
and dfv3
helps to find which columns should be divided (those which has been replaced). Condition df==0
is the key element in this solution. I am hoping that someone posts a data.table or dplyr solution to this.
dfout <- df
dfv2 <- data.frame(df[,-1], Jan = df[,1])
dfv3 <- data.frame(Dec = df[,12], df[,-12])
dfout[df==0] <- dfv2[df==0]
dfout[df==0 | dfv3==0] <- dfout[df==0 | dfv3==0] / 2
rowSums(df) - rowSums(dfout)
# [1] 0 0 0 0 0 0 0 0 0 0
dfout
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1 6.5 7.0 7.0 8.0 8.0 11.0 11.0 10.0 10.0 9.0 9.0 6.5
# 2 15.0 15.0 15.0 0.0 0.0 0.0 0.0 0.0 0.0 15.0 15.0 15.0
# 3 0.0 14.5 14.5 33.0 24.0 24.0 24.5 24.5 25.0 25.0 16.5 16.5
# 4 22.5 22.5 30.0 0.0 27.5 27.5 34.5 34.5 55.0 0.0 20.0 20.0
# 5 27.0 27.0 22.5 22.5 24.0 24.0 36.5 36.5 23.0 23.0 18.0 18.0
# 6 8.0 7.5 7.5 6.5 6.5 8.0 8.0 12.0 12.0 11.5 11.5 8.0
# 7 16.0 16.0 26.0 0.0 28.5 28.5 32.5 32.5 49.0 0.0 16.0 16.0
# 8 3.0 3.0 3.0 2.5 2.5 3.5 3.5 3.0 3.0 3.0 3.0 3.0
# 9 332.5 332.5 310.0 271.0 323.0 323.0 353.0 353.0 292.5 292.5 258.0 258.0
# 10 35.0 35.0 41.0 0.0 50.5 50.5 56.0 56.0 112.0 0.0 45.0 45.0
Data:
df <- structure(list(Jan = c(13L, 30L, 0L, 0L, 0L, 16L, 0L, 6L, 0L,
0L), Feb = c(0L, 0L, 0L, 45L, 54L, 0L, 32L, 0L, 665L, 70L), Mar = c(14L,
30L, 29L, 30L, 0L, 15L, 26L, 6L, 310L, 41L), Apr = c(0L, 0L,
33L, 0L, 45L, 0L, 0L, 0L, 271L, 0L), May = c(16L, 0L, 0L, 0L,
0L, 13L, 0L, 5L, 0L, 0L), Jun = c(0L, 0L, 48L, 55L, 48L, 0L,
57L, 0L, 646L, 101L), Jul = c(22L, 0L, 0L, 0L, 0L, 16L, 0L, 7L,
0L, 0L), Aug = c(0L, 0L, 49L, 69L, 73L, 0L, 65L, 0L, 706L, 112L
), Sep = c(20L, 0L, 0L, 55L, 0L, 24L, 49L, 6L, 0L, 112L), Oct = c(0L,
0L, 50L, 0L, 46L, 0L, 0L, 0L, 585L, 0L), Nov = c(18L, 30L, 0L,
0L, 0L, 23L, 0L, 6L, 0L, 0L), Dec = c(0L, 0L, 33L, 40L, 36L,
0L, 32L, 0L, 516L, 90L)), .Names = c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class =
"data.frame", row.names = c(NA, -10L))
If you want to make all the elements non-zero you can use a while
:
original.df <- df
while(any(df==0)) {
dfout <- df
dfv2 <- data.frame(df[,-1], Jan = df[,1])
dfv3 <- data.frame(Dec = df[,12], df[,-12])
dfout[df==0] <- dfv2[df==0]
dfout[df==0 | dfv3==0] <- dfout[df==0 | dfv3==0] / 2
df <- dfout
}
rowSums(original.df) - rowSums(dfout)
# [1] 0 0 0 0 0 0 0 0 0 0
dfout
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1 6.5 7.0 7.0 8.00 8.00 11.00 11.00 10.0 10.0 9.0 9 6.5
# 2 15.0 15.0 15.0 0.23 0.23 0.47 0.94 1.9 3.8 7.5 15 15.0
# 3 7.2 7.2 14.5 33.00 24.00 24.00 24.50 24.5 25.0 25.0 16 16.5
# 4 22.5 22.5 30.0 13.75 13.75 27.50 34.50 34.5 55.0 10.0 10 20.0
# 5 27.0 27.0 22.5 22.50 24.00 24.00 36.50 36.5 23.0 23.0 18 18.0
# 6 8.0 7.5 7.5 6.50 6.50 8.00 8.00 12.0 12.0 11.5 12 8.0
# 7 16.0 16.0 26.0 14.25 14.25 28.50 32.50 32.5 49.0 8.0 8 16.0
# 8 3.0 3.0 3.0 2.50 2.50 3.50 3.50 3.0 3.0 3.0 3 3.0
# 9 332.5 332.5 310.0 271.00 323.00 323.00 353.00 353.0 292.5 292.5 258 258.0
# 10 35.0 35.0 41.0 25.25 25.25 50.50 56.00 56.0 112.0 22.5 22 45.0
Upvotes: 5