Reputation: 1032
I'm trying to create a function that will yield average cost per disease. I'm sure this isn't the best method, but it does seem to work when I feed it a column index. However, when I wrap it in a function and execute the code, I get a parse error:
Error in parse(text = x) : <text>:1:2: unexpected '=='
1: ==
^
Here is some sample data:
t <- data.frame(Asthma = c(0, 1, 1, 0, 1),
Diabetes = c(1, 0, 1, 0, 0),
CF = c(1, 0, 0, 0, 0),
AnnualSpend = c(12345, 23323, 50000, 10000, 543))
Here's my for loop:
y <- data.frame()
for(i in 1:ncol(t)-1) {
n <- names(t[i])
s <- paste(n, " == 1", sep="")
r <- t %>% filter_( s ) %>%
summarize( Avg = mean(AnnualSpend) )
x <- cbind(n,r)
y = rbind(x,y)
}
I did step through it with 1 as the column index, then use parse(text = s), which seemed to work just fine. I'm just a little confused why it works when I do it manually, but it fails as a function. Thanks in advance for any help.
Upvotes: 0
Views: 670
Reputation: 3369
When debugging code, print the output at each step. For instance, I print(i) and I find that no value is returned, indicating the function does not even make it to this step, meaning the problem is upstream.
Since the only value upstream is a for statement, check what is in the for statement:
1:ncol(t)-1
# [1] 0 1 2 3
Read the statement, you are asking for a seq 1:ncol(t) and subracting 1 from the sequence. Therefore i is starting at zero and is incorrect. You are missing () around (ncol(t)-1). Below is the corrected code.
for(i in 1:(ncol(t)-1)) {
n <- names(t[i])
s <- paste(n, " == 1", sep="")
r <- t %>% filter_( s ) %>%
summarize( Avg = mean(AnnualSpend) )
x <- cbind(n,r)
y = rbind(x,y)
}
A for loop is not the most efficient way to do this though.
Upvotes: 1
Reputation: 160607
I'll walk you through finding where your code is broken.
You get the error in the filter_
line, so check what filter you are actually trying to apply:
s
# [1] " == 1"
Of course this is wrong. Why would it be this way? You should check out the code that generates it. What's wrong with it?
n
# character(0)
Huh, that doesn't make sense either, n
should be one of the names of the data.frame
. Okay, let's keep tracing back:
names(t)
# [1] "Asthma" "Diabetes" "CF" "AnnualSpend"
names(t)[i]
# character(0)
Okay, so it must be i
:
i
# [1] 0
In R, indexes are 1-based, not 0-based, but you probably knew that. However, the reason it is 0 is that you specified 1:ncol(t)-1
. Based on the order of operations, that's effectively (1:ncol(t)) - 1
, though I suspect you intended 1:(ncol(t)-1)
.
1:ncol(t) - 1
# [1] 0 1 2 3
1:(ncol(t) - 1)
# [1] 1 2 3
When you change that code, your results:
y
# n Avg
# 1 CF 12345.0
# 2 Diabetes 31172.5
# 3 Asthma 24622.0
It's important to know not just "what code language does what", but it's also important to be able to step through the code (forwards or backwards) to find where a symptom turns into a problem-source.
Upvotes: 1