Schweine Mareike
Schweine Mareike

Reputation: 51

How can I repeat calculations of many single colums and store the output in a new data frame in R?

I never used loops but now need to repeat one calculation multiple times and want to store the output in a new data frame.

I found some code that seems to fit but is not working. Please find an example of very few columns

df<-data.frame("running.nr" = 1:5,
               "spec1"= c(4,7,2,90,15),
               "spec2"= c(3,10,48,10,4),
               "spec3"= c(3,10,49,30,3),
               "spec4"= c(10,27,99,130,22),
               "n.id"= c(9,25,99,100,20))

This is the calculation I need to repeat for more than 50 columns. The output should be stored in a new dataframe also containing the "running.nr" column from df

perc.comp1<-(df[,"spec1"]*100)/df$n.id
perc.comp2<-(df[,"spec2"]*100)/df$n.id
perc.comp3<-(df[,"spec3"]*100)/df$n.id
perc.comp4<-(df[,"spec4"]*100)/df$n.id

df.perc<-data.frame(df$running.nr,
                    perc.comp1,
                    perc.comp2,
                    perc.comp3,
                    perc.comp4)

This is the non-working loop I tried to make this code above less repetitive:

for(col in names(df)[2:5]) {
  df[paste0(col, "_pct")] = df[x] *100/ df$n.id}

This is the error message I get: "Error in [.data.frame(df, x) : object 'x' not found" However, I am also not confident if the for loop results completely in want I want. Thanks for your time and help!

Upvotes: 2

Views: 58

Answers (3)

tmfmnk
tmfmnk

Reputation: 39858

Or with dplyr, you can do:

df %>%
 mutate_at(vars(starts_with("spec")), list(~ . * 100/n.id))

  running.nr     spec1    spec2    spec3    spec4 n.id
1          1 44.444444 33.33333 33.33333 111.1111    9
2          2 28.000000 40.00000 40.00000 108.0000   25
3          3  2.020202 48.48485 49.49495 100.0000   99
4          4 90.000000 10.00000 30.00000 130.0000  100
5          5 75.000000 20.00000 15.00000 110.0000   20

If you want it as new variables:

df %>%
 mutate_at(vars(starts_with("spec")), list(perc_comp = ~ . * 100/n.id))

  running.nr spec1 spec2 spec3 spec4 n.id spec1_perc_comp spec2_perc_comp spec3_perc_comp spec4_perc_comp
1          1     4     3     3    10    9       44.444444        33.33333        33.33333        111.1111
2          2     7    10    10    27   25       28.000000        40.00000        40.00000        108.0000
3          3     2    48    49    99   99        2.020202        48.48485        49.49495        100.0000
4          4    90    10    30   130  100       90.000000        10.00000        30.00000        130.0000
5          5    15     4     3    22   20       75.000000        20.00000        15.00000        110.0000

Or if the df consists of just species names, "running.nr" and "n.id":

df %>%
 mutate_at(vars(-matches("(running.nr)|(n.id)")), list(perc_comp = ~ . * 100/n.id))

Upvotes: 1

mischva11
mischva11

Reputation: 2956

Also there are already good answers how to solve this efficiently, i still want to help you with your code. Be aware for loops are in R mostly pretty slow and dplyr (tmfmnk), apply or direct calculation like Ronak Shah provided are faster, easier and more R "like". But since sometimes you need them, here the explaination of your for loop.

The error message states, you don't have a df[x]. When you use a for loop, you declare your loop variable. In your case it's col. So the you usedx is never declared in your loop. So the solution here is a simple fix of a typo:

for(col in names(df)[2:5]) {
  df[paste0(col, "_pct")] = df[col] *100/ df$n.id
}

output:

  running.nr spec1 spec2 spec3 spec4 n.id spec1_pct spec2_pct spec3_pct spec4_pct
1          1     4     3     3    10    9 44.444444  33.33333  33.33333  111.1111
2          2     7    10    10    27   25 28.000000  40.00000  40.00000  108.0000
3          3     2    48    49    99   99  2.020202  48.48485  49.49495  100.0000
4          4    90    10    30   130  100 90.000000  10.00000  30.00000  130.0000
5          5    15     4     3    22   20 75.000000  20.00000  15.00000  110.0000

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388862

You could select the columns and perform this calculation directly

cols <- grep("spec", names(df), value = TRUE)
df[paste0(cols, "_pct")] <- (df[cols] * 100)/df$n.id

df
#  running.nr spec1 spec2 spec3 spec4 n.id spec1_pct spec2_pct spec3_pct spec4_pct
#1          1     4     3     3    10    9 44.444444  33.33333  33.33333  111.1111
#2          2     7    10    10    27   25 28.000000  40.00000  40.00000  108.0000
#3          3     2    48    49    99   99  2.020202  48.48485  49.49495  100.0000
#4          4    90    10    30   130  100 90.000000  10.00000  30.00000  130.0000
#5          5    15     4     3    22   20 75.000000  20.00000  15.00000  110.0000

Upvotes: 2

Related Questions