Reputation: 39
This is a question for school, but I have been working on it for hours and just need a point in the right direction. I am not asking for the full answer.
I was given a data frame with student grades for various assessments. I have to write a function that will result in the number of columns that either start with a given prefix or match the name entirely.
I was provided with the following framework:
assessmentCount <- function(df, assessmentNamePrefix)
{
}
I need to be able to write the code to get the exact results below when the following lines of code are executed:
assessmentCount(df,"hw")
# [1] 7
and
assessmentCount(df,"exam1")
# [1] 1
I've found that the following code, when run independently of the framework and with the [hw] written in, gives the correct number of 7:
my_columns <- df[, grep("^[hw]", names(df), value=TRUE)]
ncol(my_columns)
However, when I do the same with [exam1], I get an incorrect number of 3 because it includes columns for exam1, exam2, and exam3:
my_columns <- df[, grep("^[exam1]", names(df), value=TRUE)]
ncol(my_columns)
Even worse, when I put the code into the framework and replace the values with the variable assessmentNamePrefix, I get incorrect values of 8 for both tests.
assessmentCount <- function(df, assessmentNamePrefix)
{
my_columns <- df[, grep("^[assessmentNamePrefix]", names(df), value=TRUE)]
ncol(my_columns)
}
I am very frustrated at this point and am not understanding what is going wrong. I do realize that this is a very basic question, but I'm at the beginning of a very basic R programming course. Could someone please point me in the right direction? It would be very much appreciated. Thank you :)
Upvotes: 0
Views: 4524
Reputation: 214927
You can use the base startsWith()
function, which is faster and more convenient than the regular expression grepl("^<prefix>", x)
in this case, as specified from ?startsWith()
:
startsWith() is equivalent to but much faster than
substring(x, 1, nchar(prefix)) == prefix or also
grepl("^prefix", x)
assessmentCount <- function(df, assessmentNamePrefix)
{
sum(startsWith(names(df), assessmentNamePrefix))
}
Upvotes: 3
Reputation: 2082
Your Regex appears wrong. I think it should be:
sum(grepl(paste0("^",assessmentNamePrefix),names(df)))
Upvotes: 2