Reputation: 51
I never used loops but now need to repeat one calculation multiple times and want to store the output in a new data frame.
I found some code that seems to fit but is not working. Please find an example of very few columns
df<-data.frame("running.nr" = 1:5,
"spec1"= c(4,7,2,90,15),
"spec2"= c(3,10,48,10,4),
"spec3"= c(3,10,49,30,3),
"spec4"= c(10,27,99,130,22),
"n.id"= c(9,25,99,100,20))
This is the calculation I need to repeat for more than 50 columns. The output should be stored in a new dataframe also containing the "running.nr" column from df
perc.comp1<-(df[,"spec1"]*100)/df$n.id
perc.comp2<-(df[,"spec2"]*100)/df$n.id
perc.comp3<-(df[,"spec3"]*100)/df$n.id
perc.comp4<-(df[,"spec4"]*100)/df$n.id
df.perc<-data.frame(df$running.nr,
perc.comp1,
perc.comp2,
perc.comp3,
perc.comp4)
This is the non-working loop I tried to make this code above less repetitive:
for(col in names(df)[2:5]) {
df[paste0(col, "_pct")] = df[x] *100/ df$n.id}
This is the error message I get:
"Error in [.data.frame
(df, x) : object 'x' not found"
However, I am also not confident if the for loop results completely in want I want. Thanks for your time and help!
Upvotes: 2
Views: 58
Reputation: 39858
Or with dplyr
, you can do:
df %>%
mutate_at(vars(starts_with("spec")), list(~ . * 100/n.id))
running.nr spec1 spec2 spec3 spec4 n.id
1 1 44.444444 33.33333 33.33333 111.1111 9
2 2 28.000000 40.00000 40.00000 108.0000 25
3 3 2.020202 48.48485 49.49495 100.0000 99
4 4 90.000000 10.00000 30.00000 130.0000 100
5 5 75.000000 20.00000 15.00000 110.0000 20
If you want it as new variables:
df %>%
mutate_at(vars(starts_with("spec")), list(perc_comp = ~ . * 100/n.id))
running.nr spec1 spec2 spec3 spec4 n.id spec1_perc_comp spec2_perc_comp spec3_perc_comp spec4_perc_comp
1 1 4 3 3 10 9 44.444444 33.33333 33.33333 111.1111
2 2 7 10 10 27 25 28.000000 40.00000 40.00000 108.0000
3 3 2 48 49 99 99 2.020202 48.48485 49.49495 100.0000
4 4 90 10 30 130 100 90.000000 10.00000 30.00000 130.0000
5 5 15 4 3 22 20 75.000000 20.00000 15.00000 110.0000
Or if the df consists of just species names, "running.nr" and "n.id":
df %>%
mutate_at(vars(-matches("(running.nr)|(n.id)")), list(perc_comp = ~ . * 100/n.id))
Upvotes: 1
Reputation: 2956
Also there are already good answers how to solve this efficiently, i still want to help you with your code. Be aware for
loops are in R mostly pretty slow and dplyr (tmfmnk), apply
or direct calculation like Ronak Shah provided are faster, easier and more R
"like". But since sometimes you need them, here the explaination of your for loop.
The error message states, you don't have a df[x]
. When you use a for loop, you declare your loop variable. In your case it's col
. So the you usedx
is never declared in your loop. So the solution here is a simple fix of a typo:
for(col in names(df)[2:5]) {
df[paste0(col, "_pct")] = df[col] *100/ df$n.id
}
output:
running.nr spec1 spec2 spec3 spec4 n.id spec1_pct spec2_pct spec3_pct spec4_pct
1 1 4 3 3 10 9 44.444444 33.33333 33.33333 111.1111
2 2 7 10 10 27 25 28.000000 40.00000 40.00000 108.0000
3 3 2 48 49 99 99 2.020202 48.48485 49.49495 100.0000
4 4 90 10 30 130 100 90.000000 10.00000 30.00000 130.0000
5 5 15 4 3 22 20 75.000000 20.00000 15.00000 110.0000
Upvotes: 1
Reputation: 388862
You could select the columns and perform this calculation directly
cols <- grep("spec", names(df), value = TRUE)
df[paste0(cols, "_pct")] <- (df[cols] * 100)/df$n.id
df
# running.nr spec1 spec2 spec3 spec4 n.id spec1_pct spec2_pct spec3_pct spec4_pct
#1 1 4 3 3 10 9 44.444444 33.33333 33.33333 111.1111
#2 2 7 10 10 27 25 28.000000 40.00000 40.00000 108.0000
#3 3 2 48 49 99 99 2.020202 48.48485 49.49495 100.0000
#4 4 90 10 30 130 100 90.000000 10.00000 30.00000 130.0000
#5 5 15 4 3 22 20 75.000000 20.00000 15.00000 110.0000
Upvotes: 2