Reputation: 459
I'm having some trouble understanding how R handles subsetting internally and this is causing me some issues while trying to build some functions. Take the following code:
f <- function(directory, variable, number_seq) {
##Create a empty data frame
new_frame <- data.frame()
## Add every data frame in the directory whose name is in the number_seq to new_frame
## the file variable specify the path to the file
for (i in number_seq){
file <- paste("~/", directory, "/",sprintf("%03d", i), ".csv", sep = "")
x <- read.csv(file)
new_frame <- rbind.data.frame(new_frame, x)
}
## calculate and return the mean
mean(new_frame[, variable], na.rm = TRUE)*
}
*While calculating the mean I tried to subset first using the $
sign new_frame$variable
and the subset function subset( new_frame, select = variable
but it would only return a None value. It only worked when I used new_frame[, variable]
.
Can anyone explain why the other subseting didn't work? It took me a really long time to figure it out and even though I managed to make it work I still don't know why it didn't work in the other ways and I really wanna look inside the black box so I won't have the same issues in the future.
Thanks for the help.
Upvotes: 2
Views: 71
Reputation: 83215
This behavior has to do with the fact that you are subsetting inside a function.
Both new_frame$variable
and subset(new_frame, select = variable)
look for a column in the dataframe withe name variable
.
On the other hand, using new_frame[, variable]
uses the variablename in f(directory, variable, number_seq)
to select the column.
Upvotes: 1
Reputation: 206197
The dollar sign ($
) can only be used with literal column names. That avoids confusion with
dd<-data.frame(
id=1:4,
var=rnorm(4),
value=runif(4)
)
var <- "value"
dd$var
In this case if $
took variables or column names, which do you expect? The dd$var
column or the dd$value
column (because var == "value"
). That's why the dd[, var]
way is different because it only takes character vectors, not expressions referring to column names. You will get dd$value
with dd[, var]
I'm not quite sure why you got None
with subset()
I was unable to replicate that problem.
Upvotes: 1