Reputation: 155
I have a tibble called 'Volume' in which I store some data (10 columns - the first 2 columns are characters, 30 rows). Now I want to calculate the relative Volume of every column that corresponds to Column 3 of my tibble. My current solution looks like this:
rel.Volume_unmod = tibble(
"Volume_OD" = Volume[[3]] / Volume[[3]],
"Volume_Imp" = Volume[[4]] / Volume[[3]],
"Volume_OD_1" = Volume[[5]] / Volume[[3]],
"Volume_WS_1" = Volume[[6]] / Volume[[3]],
"Volume_OD_2" = Volume[[7]] / Volume[[3]],
"Volume_WS_2" = Volume[[8]] / Volume[[3]],
"Volume_OD_3" = Volume[[9]] / Volume[[3]],
"Volume_WS_3" = Volume[[10]] / Volume[[3]])
rel.Volume_unmod
I would like to keep the tibble structure and the labels. I am sure there is a better solution for this, but I am relative new to R so I it's not obvious to me. What I tried is something like this, but I can't actually run this:
rel.Volume = NULL
for(i in Volume[,3:10]){
rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
}
Upvotes: 0
Views: 89
Reputation: 111
Whithout a minimal working example it's hard to guess what the Variable Volume
actually refers to. Apart from that there seems to be a problem with your for
-loop:
for(i in Volume[,3:10]){
Assuming Volume
refers to a data.frame
or tibble
, this causes the actual column-vectors with indices between 3 and 10 to be assigned to i
successively. You can verify this by putting print(i)
inside the loop. But inside the loop it seems like you actually want to use i
as a variable containing just the index of the current column as a number (not the column itself):
rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
Also, two brackets are usually used with lists, not data.frames
or tibbles
. (You can, however, do so, because data.frames
are special cases of lists.)
Last but not least, initialising the variable rel.Volume
with NULL
will result in an error, when trying to reassign to that variable, since you haven't told R
, what rel.Volume
should be.
Try this, if you like (thanks @Edo for example data):
set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
GR = sample(LETTERS, 30, TRUE),
Vol1 = rnorm(30),
Vol2 = rnorm(30),
Vol3 = rnorm(30))
rel.Volume <- Volume[1:2] # Assuming you want to keep the IDs.
# Your data.frame will need to have the correct number of rows here already.
for (i in 3:ncol(Volume)){ # ncol gives the total number of columns in data.frame
rel.Volume[i] = Volume[i]/Volume[3]
}
A more R
-like approach would be to avoid using a for
-loop altogether, since R
's strength is implicit vectorization. These expressions will produce the same result without a loop:
# OK, this one messes up variable names...
rel.V.2 <- data.frame(sapply(X = Volume[3:5], FUN = function(x) x/Volume[3]))
rel.V.3 <- data.frame(Map(`/`, Volume[3:5], Volume[3]))
Since you said you were new to R
, frankly I would recommend avoiding the Tidyverse-packages while you are still learing the basics. From my experience, in the long run you're better off learning base-R
first and adding the "sugar" when you're more familiar with the core language. You can still learn to use Tidyverse-functions later (but then, why would anybody? ;-) ).
Upvotes: 0
Reputation: 7818
Since you did not provide some data, I've followed the description you provided to create some mockup data. Here:
set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
GR = sample(LETTERS, 30, TRUE))
Volume[3:10] <- rnorm(30*8)
library(dplyr)
# rename columns [brute force]
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(Volume)[3:10] <- cols
# divide by Volumn_OD
rel.Volume_unmod <- Volume %>%
mutate(across(all_of(cols), ~ . / Volume_OD))
# result
rel.Volume_unmod
rel.Volume_unmod
. Anyhow, to avoid any problem I renamed the columns (kinda brutally). You can do it with dplyr::rename
if you wan to.mutate
. mutate
is a verb from dplyr
that allows you to create new columns or perform operations or functions on columns.across
is an adverb from dplyr
. Let's simplify by saying that it's a function that allows you to perform a function over multiple columns. In this case I want to perform a division by Volum_OD
.~
is a tidyverse
way to create anonymous functions. ~ . / Volum_OD
is equivalent to function(x) x / Volumn_OD
all_of
is necessary because in this specific case I'm providing across
with a vector of characters. Without it, it will work anyway, but you will receive a warning because it's ambiguous and it may work incorrectly in same cases.Check out this book to learn more about data manipulation with tidyverse
(which dplyr
is part of).
rel.Volume_unmod <- Volume
# rename columns
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(rel.Volume_unmod)[3:10] <- cols
# divide by columns 3
rel.Volume_unmod[3:10] <- lapply(rel.Volume_unmod[3:10], `/`, rel.Volume_unmod[3])
rel.Volume_unmod
lapply
is a base R function that allows you to apply a function to every item of a list or a "listable" object.rel.Volume_unmod
is a listable object: a dataframe is just a list of vectors with the same length. Therefore, lapply
takes one column [= one item] a time and applies a function./
. You usually see /
used like this: A / B
, but actually /
is a Primitive function. You could write the same thing in this way: `/`(A, B) # same as A / B
lapply
can be provided with additional parameters that are passed directly to the function that is being applied over the list (in this case /
). Therefore, we are writing rel.Volume_unmod[3]
as additional parameter.lapply
always returns a list. But, since we are assigning the result of lapply to a "fraction of a dataframe", we will just edit the columns of the dataframe and, as a result, we will have a dataframe instead of a list. Let me rephrase in a more technical way. When you are assigning rel.Volume_unmod[3:10] <- lapply(...)
, you are not simply assigning a list to rel.Volume_unmod[3:10]
. You are technically using this assigning function: [<-
. This is a function that allows to edit the items in a list/vector/dataframe. Specifically, [<-
allows you to assign new items without modifying the attributes of the list/vector/dataframe. As I said before, a dataframe is just a list with specific attributes. Then when you use [<-
you modify the columns, but you leave the attributes (the class data.frame in this case) untouched. That's why the magic works.Upvotes: 2