Michael Flatebo
Michael Flatebo

Reputation: 81

R: How to use Apply function taking multiple inputs across rows and columns

Jan   Feb   Mar   Apr   May   Jun   Jul   Aug  Sep   Oct   Nov   Dec  

 13    0     14     0    16     0    22     0    20     0    18     0
 30    0     30     0     0     0     0     0     0     0    30     0
 0     0     29    33     0    48     0    49     0    50     0    33
 0     45    30     0     0    55     0    69    55     0     0    40
 0     54    0     45     0    48     0    73     0    46     0    36
 16    0     15     0    13     0    16     0    24     0    23     0
 0     32    26     0     0    57     0    65    49     0     0    32
 6     0     6      0     5     0     7     0     6     0     6     0
 0     665   310   271    0   646     0   706     0   585     0   516
 0     70    41     0     0   101     0   112   112     0     0    90

I currently have this sparse data frame in which I want to replace all the 0's with half of the value to the right and replace zero's in December with half of January's value.

For example the first row would should look like:

Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec

6.5    7    7     8     8     11     11    10    10     9     9    6.5

So essentially the total for the year should remained unchanged and should just be spread across the months more evenly.

I have tried to use loops however my actual data spans multiple years and hundreds of thousand rows and loops are much too slow. I know apply is the way to go but can't figure out what my function should be or how to apply it to rows and columns.

Thanks

Upvotes: 3

Views: 221

Answers (1)

M--
M--

Reputation: 28825

I made dfout to leave the original df intact. dfv2 gives us what is the value that should be put in the dfout and dfv3 helps to find which columns should be divided (those which has been replaced). Condition df==0 is the key element in this solution. I am hoping that someone posts a or solution to this.

dfout <- df
dfv2 <- data.frame(df[,-1], Jan = df[,1])   
dfv3 <- data.frame(Dec = df[,12], df[,-12]) 
dfout[df==0] <- dfv2[df==0]
dfout[df==0 | dfv3==0] <- dfout[df==0 | dfv3==0] / 2

rowSums(df) - rowSums(dfout)

#  [1] 0 0 0 0 0 0 0 0 0 0


dfout

#      Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec 
# 1    6.5   7.0   7.0   8.0   8.0  11.0  11.0  10.0  10.0   9.0   9.0   6.5 
# 2   15.0  15.0  15.0   0.0   0.0   0.0   0.0   0.0   0.0  15.0  15.0  15.0 
# 3    0.0  14.5  14.5  33.0  24.0  24.0  24.5  24.5  25.0  25.0  16.5  16.5 
# 4   22.5  22.5  30.0   0.0  27.5  27.5  34.5  34.5  55.0   0.0  20.0  20.0 
# 5   27.0  27.0  22.5  22.5  24.0  24.0  36.5  36.5  23.0  23.0  18.0  18.0 
# 6    8.0   7.5   7.5   6.5   6.5   8.0   8.0  12.0  12.0  11.5  11.5   8.0 
# 7   16.0  16.0  26.0   0.0  28.5  28.5  32.5  32.5  49.0   0.0  16.0  16.0 
# 8    3.0   3.0   3.0   2.5   2.5   3.5   3.5   3.0   3.0   3.0   3.0   3.0 
# 9  332.5 332.5 310.0 271.0 323.0 323.0 353.0 353.0 292.5 292.5 258.0 258.0 
# 10  35.0  35.0  41.0   0.0  50.5  50.5  56.0  56.0 112.0   0.0  45.0  45.0

Data:

df <- structure(list(Jan = c(13L, 30L, 0L, 0L, 0L, 16L, 0L, 6L, 0L, 
0L), Feb = c(0L, 0L, 0L, 45L, 54L, 0L, 32L, 0L, 665L, 70L), Mar = c(14L, 
30L, 29L, 30L, 0L, 15L, 26L, 6L, 310L, 41L), Apr = c(0L, 0L, 
33L, 0L, 45L, 0L, 0L, 0L, 271L, 0L), May = c(16L, 0L, 0L, 0L, 
0L, 13L, 0L, 5L, 0L, 0L), Jun = c(0L, 0L, 48L, 55L, 48L, 0L, 
57L, 0L, 646L, 101L), Jul = c(22L, 0L, 0L, 0L, 0L, 16L, 0L, 7L, 
0L, 0L), Aug = c(0L, 0L, 49L, 69L, 73L, 0L, 65L, 0L, 706L, 112L
), Sep = c(20L, 0L, 0L, 55L, 0L, 24L, 49L, 6L, 0L, 112L), Oct = c(0L, 
0L, 50L, 0L, 46L, 0L, 0L, 0L, 585L, 0L), Nov = c(18L, 30L, 0L, 
0L, 0L, 23L, 0L, 6L, 0L, 0L), Dec = c(0L, 0L, 33L, 40L, 36L, 
0L, 32L, 0L, 516L, 90L)), .Names = c("Jan", "Feb", "Mar", "Apr", 
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = 
"data.frame", row.names = c(NA, -10L))

Update:

If you want to make all the elements non-zero you can use a while:

original.df <- df

while(any(df==0)) {
                   dfout <- df
                   dfv2 <- data.frame(df[,-1], Jan = df[,1])   
                   dfv3 <- data.frame(Dec = df[,12], df[,-12]) 
                   dfout[df==0] <- dfv2[df==0]
                   dfout[df==0 | dfv3==0] <- dfout[df==0 | dfv3==0] / 2
                   df <- dfout
                  }

rowSums(original.df) - rowSums(dfout)

#  [1] 0 0 0 0 0 0 0 0 0 0

dfout

#      Jan   Feb   Mar    Apr    May    Jun    Jul   Aug   Sep   Oct Nov   Dec 
# 1    6.5   7.0   7.0   8.00   8.00  11.00  11.00  10.0  10.0   9.0   9   6.5 
# 2   15.0  15.0  15.0   0.23   0.23   0.47   0.94   1.9   3.8   7.5  15  15.0 
# 3    7.2   7.2  14.5  33.00  24.00  24.00  24.50  24.5  25.0  25.0  16  16.5 
# 4   22.5  22.5  30.0  13.75  13.75  27.50  34.50  34.5  55.0  10.0  10  20.0 
# 5   27.0  27.0  22.5  22.50  24.00  24.00  36.50  36.5  23.0  23.0  18  18.0 
# 6    8.0   7.5   7.5   6.50   6.50   8.00   8.00  12.0  12.0  11.5  12   8.0 
# 7   16.0  16.0  26.0  14.25  14.25  28.50  32.50  32.5  49.0   8.0   8  16.0 
# 8    3.0   3.0   3.0   2.50   2.50   3.50   3.50   3.0   3.0   3.0   3   3.0 
# 9  332.5 332.5 310.0 271.00 323.00 323.00 353.00 353.0 292.5 292.5 258 258.0 
# 10  35.0  35.0  41.0  25.25  25.25  50.50  56.00  56.0 112.0  22.5  22  45.0

Upvotes: 5

Related Questions