Reputation: 819
I have some data that contains 400+ columns and ~80 observations. I would like to use a for loop to go through each column and, if it contains the desired prefix exp_
, I would like to create a new column which is that value divided by a reference column, stored as the same name but with a suffix _pp
. I'd also like to do an else if with the other prefix rev_
but I think as long as I can get the first problem figured out I can solve the rest myself. Some example data is below:
exp_alpha exp_bravo rev_charlie rev_delta pupils
10 28 38 95 2
24 56 39 24 5
94 50 95 45 3
15 93 72 83 9
72 66 10 12 3
The first time I tried it, the loop ran through properly but only stored the final column in which the if statement was true, rather than storing each column in which the if statement was true. I made some tweaks and lost that code but now have this which runs without error but doesn't modify the data frame at all.
for (i in colnames(test)) {
if(grepl("exp_", colnames(test)[i])) {
test[paste(i,"pp", sep="_")] <- test[i] / test$pupils)
}
}
My understanding of what this is doing:
I imagine since my the code is executing without error but not doing anything that my problem is in the if() statement, but I can't figure out what I'm doing wrong. I also tried adding "==TRUE" in the if() statement but that achieved the same result.
Upvotes: 5
Views: 2937
Reputation: 28329
Linear solution:
Don't use loop for that! You can linearize your code and run it much faster than looping over columns. Here's how to do it:
# Extract column names
cNames <- colnames(test)
# Find exp in column names
foo <- grep("exp", cNames)
# Divide by reference: ALL columns at the SAME time
bar <- test[, foo] / test$pupils
# Rename exp to pp : ALL columns at the SAME time
colnames(bar) <- gsub("exp", "pp", cNames[foo])
# Add to original dataset instead of iteratively appending
cbind(test, bar)
Upvotes: 1
Reputation: 10301
As an alternative to @timfaber's answer, you can keep your first line the same but not treat i
as an index:
for (i in colnames(test)) {
if(grepl("exp_", i)) {
print(i)
test[paste(i,"pp", sep="_")] <- test[i] / test$pupils
}
}
Upvotes: 2
Reputation: 2070
Almost correct, you did not define the length of the loop so nothing happened. Try this:
for (i in 1:length(colnames(test))) {
if(grepl("exp_", colnames(test)[i])) {
test[paste(i,"pp", sep="_")] <- test[i] / test$pupils
}
}
Upvotes: 3