Carlo
Carlo

Reputation: 155

Creating a simple for loop in R

I have a tibble called 'Volume' in which I store some data (10 columns - the first 2 columns are characters, 30 rows). Now I want to calculate the relative Volume of every column that corresponds to Column 3 of my tibble. My current solution looks like this:

rel.Volume_unmod = tibble(
            "Volume_OD" = Volume[[3]] / Volume[[3]],
            "Volume_Imp" = Volume[[4]] / Volume[[3]],
            "Volume_OD_1" = Volume[[5]] / Volume[[3]],
            "Volume_WS_1" = Volume[[6]] / Volume[[3]],
            "Volume_OD_2"  = Volume[[7]] / Volume[[3]],
            "Volume_WS_2" = Volume[[8]] / Volume[[3]], 
            "Volume_OD_3" = Volume[[9]] / Volume[[3]],
            "Volume_WS_3" = Volume[[10]] / Volume[[3]])
rel.Volume_unmod 

I would like to keep the tibble structure and the labels. I am sure there is a better solution for this, but I am relative new to R so I it's not obvious to me. What I tried is something like this, but I can't actually run this:

rel.Volume = NULL
for(i in Volume[,3:10]){

rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
}

Upvotes: 0

Views: 89

Answers (2)

Codebird
Codebird

Reputation: 111

Whithout a minimal working example it's hard to guess what the Variable Volume actually refers to. Apart from that there seems to be a problem with your for-loop:

for(i in Volume[,3:10]){

Assuming Volume refers to a data.frame or tibble, this causes the actual column-vectors with indices between 3 and 10 to be assigned to i successively. You can verify this by putting print(i) inside the loop. But inside the loop it seems like you actually want to use i as a variable containing just the index of the current column as a number (not the column itself):

rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])

Also, two brackets are usually used with lists, not data.frames or tibbles. (You can, however, do so, because data.frames are special cases of lists.)

Last but not least, initialising the variable rel.Volume with NULL will result in an error, when trying to reassign to that variable, since you haven't told R, what rel.Volume should be.

Try this, if you like (thanks @Edo for example data):

set.seed(1)

Volume <- data.frame(ID = sample(letters, 30, TRUE),
                     GR = sample(LETTERS, 30, TRUE),
                     Vol1 = rnorm(30),
                     Vol2 = rnorm(30),
                     Vol3 = rnorm(30))

rel.Volume <- Volume[1:2] # Assuming you want to keep the IDs.
# Your data.frame will need to have the correct number of rows here already.

for (i in 3:ncol(Volume)){ # ncol gives the total number of columns in data.frame
  rel.Volume[i]  = Volume[i]/Volume[3]
}

A more R-like approach would be to avoid using a for-loop altogether, since R's strength is implicit vectorization. These expressions will produce the same result without a loop:

# OK, this one messes up variable names...
rel.V.2 <- data.frame(sapply(X = Volume[3:5], FUN = function(x) x/Volume[3]))

rel.V.3 <- data.frame(Map(`/`, Volume[3:5], Volume[3]))

Since you said you were new to R, frankly I would recommend avoiding the Tidyverse-packages while you are still learing the basics. From my experience, in the long run you're better off learning base-R first and adding the "sugar" when you're more familiar with the core language. You can still learn to use Tidyverse-functions later (but then, why would anybody? ;-) ).

Upvotes: 0

Edo
Edo

Reputation: 7818

Mockup Data

Since you did not provide some data, I've followed the description you provided to create some mockup data. Here:

set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
                     GR = sample(LETTERS, 30, TRUE))
Volume[3:10] <- rnorm(30*8)

Solution with Dplyr

library(dplyr)

# rename columns [brute force]
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(Volume)[3:10] <- cols

# divide by Volumn_OD
rel.Volume_unmod <- Volume %>% 
  mutate(across(all_of(cols), ~ . / Volume_OD))

# result
rel.Volume_unmod

Explanation

  • I don't know the names of your columns. Probably, the names correspond to the names of the columns you intended to create in rel.Volume_unmod. Anyhow, to avoid any problem I renamed the columns (kinda brutally). You can do it with dplyr::rename if you wan to.
  • There are many ways to select the columns you want to mutate. mutate is a verb from dplyr that allows you to create new columns or perform operations or functions on columns.
  • across is an adverb from dplyr. Let's simplify by saying that it's a function that allows you to perform a function over multiple columns. In this case I want to perform a division by Volum_OD.
  • ~ is a tidyverse way to create anonymous functions. ~ . / Volum_OD is equivalent to function(x) x / Volumn_OD
  • all_of is necessary because in this specific case I'm providing across with a vector of characters. Without it, it will work anyway, but you will receive a warning because it's ambiguous and it may work incorrectly in same cases.

More info

Check out this book to learn more about data manipulation with tidyverse (which dplyr is part of).


Solution with Base-R

rel.Volume_unmod <- Volume

# rename columns
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(rel.Volume_unmod)[3:10] <- cols

# divide by columns 3
rel.Volume_unmod[3:10] <- lapply(rel.Volume_unmod[3:10], `/`, rel.Volume_unmod[3])
rel.Volume_unmod

Explanation

  • lapply is a base R function that allows you to apply a function to every item of a list or a "listable" object.
  • in this case rel.Volume_unmod is a listable object: a dataframe is just a list of vectors with the same length. Therefore, lapply takes one column [= one item] a time and applies a function.
  • the function is /. You usually see / used like this: A / B, but actually / is a Primitive function. You could write the same thing in this way:
 `/`(A, B) # same as A / B
  • lapply can be provided with additional parameters that are passed directly to the function that is being applied over the list (in this case /). Therefore, we are writing rel.Volume_unmod[3] as additional parameter.
  • lapply always returns a list. But, since we are assigning the result of lapply to a "fraction of a dataframe", we will just edit the columns of the dataframe and, as a result, we will have a dataframe instead of a list. Let me rephrase in a more technical way. When you are assigning rel.Volume_unmod[3:10] <- lapply(...), you are not simply assigning a list to rel.Volume_unmod[3:10]. You are technically using this assigning function: [<-. This is a function that allows to edit the items in a list/vector/dataframe. Specifically, [<- allows you to assign new items without modifying the attributes of the list/vector/dataframe. As I said before, a dataframe is just a list with specific attributes. Then when you use [<- you modify the columns, but you leave the attributes (the class data.frame in this case) untouched. That's why the magic works.

Upvotes: 2

Related Questions