Reputation: 1390
Apologies in advance if this has been addressed before, but I've tried looking through all the questions related to ddply, sapply, and apply, and can't for the life of me figure this one out...
I've written a function, countMonths, that takes day, month, and total days in a billing cycle as arguments, and returns the number of calendar months that the billing cycle was a part of:
countMonths <- function(day, month, cycle.days) {
month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
if (month < 1 | month > 12 | floor(month) != month) {
cat("Invalid month value, must be an integer from 1 to 12")
} else if (day < 1 | day > month.days[month]) {
cat("Invalid day value, must be between 1 and month.days[month]")
} else if (cycle.days < 0) {
cat("Invalid cycle.days value, must be >= 0")
} else {
nmonths <- 1
day.ct <- cycle.days - day
while (day.ct > 0) {
nmonths <- nmonths + 1
month <- ifelse(month == 1, 12, month - 1) # sets to previous month
day.ct <- day.ct - month.days[month] # subtracts days of previous month
}
nmonths
}
}
I'd like to apply this function to every row in a data.frame containing billing records by customer, e.g.
> head(cons2[-1],10)
kwh cycle.days read.date row.index year month day kwh.per.day
1 381 29 2010-09-02 1 2010 9 2 13.137931
2 280 32 2010-10-04 2 2010 10 4 8.750000
3 282 29 2010-11-02 3 2010 11 2 9.724138
4 330 34 2010-12-06 4 2010 12 6 9.705882
5 371 30 2011-01-05 5 2011 1 5 12.366667
6 405 30 2011-02-04 6 2011 2 4 13.500000
7 441 32 2011-03-08 7 2011 3 8 13.781250
8 290 29 2011-04-06 8 2011 4 6 10.000000
9 296 29 2011-05-05 9 2011 5 5 10.206897
10 378 32 2011-06-06 10 2011 6 6 11.812500
> dput(head(cons2[-1],10))
structure(list(kwh = c(381L, 280L, 282L, 330L, 371L, 405L, 441L,
290L, 296L, 378L), cycle.days = c(29L, 32L, 29L, 34L, 30L, 30L,
32L, 29L, 29L, 32L), read.date = structure(c(1283385600, 1286150400,
1288656000, 1291593600, 1294185600, 1296777600, 1299542400, 1302048000,
1304553600, 1307318400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
row.index = 1:10, year = c(2010, 2010, 2010, 2010, 2011,
2011, 2011, 2011, 2011, 2011), month = c(9, 10, 11, 12, 1,
2, 3, 4, 5, 6), day = c(2L, 4L, 2L, 6L, 5L, 4L, 8L, 6L, 5L,
6L), kwh.per.day = c(13.1379310344828, 8.75, 9.72413793103448,
9.70588235294118, 12.3666666666667, 13.5, 13.78125, 10, 10.2068965517241,
11.8125)), .Names = c("kwh", "cycle.days", "read.date", "row.index",
"year", "month", "day", "kwh.per.day"), row.names = c(NA, 10L
), class = "data.frame")
I tried a couple of options, and none work well. Specifically, I need to be able to pass the value of a given var as a scalar (or length-1 vector) for each row in the data frame, but they always get passed as vectors:
> cons2$tot.months <- countMonths(cons2$day, cons2$month, cons2$cycle.days)
Warning messages:
1: In if (month < 1 | month > 12 | floor(month) != month) { :
the condition has length > 1 and only the first element will be used
2: In if (day < 1 | day > month.days[month]) { :
the condition has length > 1 and only the first element will be used
3: In if (cycle.days < 0) { :
the condition has length > 1 and only the first element will be used
4: In while (day.ct > 0) { :
the condition has length > 1 and only the first element will be used
5: In while (day.ct > 0) { :
the condition has length > 1 and only the first element will be used
I finally was able to get the right result using ddply, treating each row as its own group, but it takes a LONG time:
cons2 <- ddply(cons2, .(account, year, month, day), transform,
tot.months = countMonths(day, month, cycle.days)
)
Is there a better way to apply this function to each row of my data frame? Or, as a related question, how can I pass variables from a data frame as scalar arguments (the value from a given row) instead of the vector of all values of that var in the data frame? I'd especially appreciate if someone can point out where I'm going wrong conceptually in my thinking.
Upvotes: 0
Views: 156
Reputation: 18323
To get the function to work, you can use mapply
which will successively apply your function to each element of all the vectors you pass to it. So you could do:
mapply(countMonths,cons2$day,cons2$month,cons2$cycle.days)
There are easier ways to do this, as I mentioned in my comment. For example, I think this would work:
cons2$read.date=as.Date(cons2$read.date)
monnb <- function(d){ lt <- as.POSIXlt(as.Date(d, origin="1900-01-01")); lt$year*12 + lt$mon }
mondf <- function(d1, d2) monnb(d2) - monnb(d1)
mondf(cons2$read.date-cons2$cycle.days,cons2$read.date) + 1
Also, I noticed that you were trying to catch all the conditions where your function wouldn't work, which is great! There is a very convenient function called stopifnot
which will serve this purpose:
countMonths <- function(day, month, cycle.days) {
month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
stopifnot(month >=1 & month <= 12 & floor(month)==month & cycle.days >=0 & day >= 1 & day <= month.days[month])
nmonths <- 1
day.ct <- cycle.days - day
while (day.ct > 0) {
nmonths <- nmonths + 1
month <- ifelse(month == 1, 12, month - 1) # sets to previous month
day.ct <- day.ct - month.days[month] # subtracts days of previous month
}
nmonths
}
As for comments on your function, I think it works, but it doesn't take advantage of vector operations in R. The function that I got from that other answer is very slick because it allows you to feed it a whole vector of dates at once, rather than successively looping through each one.
Upvotes: 1