rcmed2024
rcmed2024

Reputation: 23

Creating multiple new variables using Vectors, existing variables, and mapply in R

I am pretty new to R and am attempting to create a new columns/variables in my data set, df, using information from multiple columns which already exist in my data set. I was hoping to use the mapply function to carry this out. This is data which is referring to certain measurements taken on the right side of someone and also on the left. Only one of these sides is affected however and is defined by df$laterality. Ultimately, I would like to create new variable/columns which defines the data collected from the measurements as data collected from the affected side.

My data, simplified, essentially looks like the following

recordID <- c(1, 2, 3, 4)
laterality <- c(right, right, left, right)
right_1_measure <- c(2.3, 3.4, 1.7, 2.4)
right_2_measure <- c(1.3, 2.2, 3.1, 4.1)
right_3_measure <- c(2.7, 2.8, 4.2, 3.9)
left_1_measure <- c(1.5, 2.6, 4.5, 2.8)
left_2_measure <- c(1.1, 3.4, 3.5, 2.6)
left_3_measure <- c (2.6, 2.8, 3.6, 1.6)

df <- data.frame(recordID, laterality, right_1_measure, right_2_measure, right_3_measure, left_1_measure, left_2_measure, left_3_measure)

I then created a vector of the column names I wished to cycle through to make the new " affected" variable/columns, which I would name in accordance to the previously defined variables but add the prefix "aff". I also created a vector of the names I hoped to give the new columns.

right_vars <- c("right_1_measure", "right_2_measure" , "right_3_measure")
left_vars <- c("left_1_measure", "left_2_measure" , "left_3_measure")
aff_vars <- c("aff_1_measure", "aff_2_measure", "aff_3_measure")

I then created the function which I was planning to use to conditionally create the new columns based on df$laterality

aff_var_create <- function (x, y, z){
  df$x <- ifelse(df$laterality == "Right" , df$y, ifelse (df$laterality == "Left", df$z, NA))
}

Then I created my mapply code

mapply(FUN = aff_var_create, x = aff_vars, y = r_vars, z = l_vars)

However, when I run this I receive the following error message:

Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
  replacement has length zero
In addition: Warning message:
In rep(yes, length.out = len) :
 Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
  replacement has length zero 

Ive checked my data frame and all columns have data in them, so I am confused as to why the y.pos has zero length.

Ultimately, I would like my data frame to look like the following

recordID <- c(1, 2, 3, 4)
laterality <- c(right, right, left, right)
right_1_measure <- c(2.3, 3.4, 1.7, 2.4)
right_2_measure <- c(1.3, 2.2, 3.1, 4.1)
right_3_measure <- c(2.7, 2.8, 4.2, 3.9)
left_1_measure <- c(1.5, 2.6, 4.5, 2.8)
left_2_measure <- c(1.1, 3.4, 3.5, 2.6)
left_3_measure <- c (2.6, 2.8, 3.6, 1.6)
aff_1_measure <- c(2.3, 3.4, 4.5, 2.4)
aff_2_measure <- c(1.3, 2.2, 3.5, 4.1)
aff_3_measure <- c(2.7, 2.8, 3.6, 3.9)

df <- data.frame(recordID, laterality, right_1_measure, right_2_measure, right_3_measure, left_1_measure, left_2_measure, left_3_measure, aff_1_measure, aff_2_measure, aff_3_measure)

Any suggestions to fixing this issue or using another method to achieve a similar result would be much appreciated! Thank you.

Upvotes: 2

Views: 146

Answers (2)

Parfait
Parfait

Reputation: 107687

You cannot dynamically pass string value with $ notation. Instead use [[. Also, since mapply does not update data frame in place, you need to assign results to columns:

right_vars <- c("right_1_measure", "right_2_measure" , "right_3_measure")
left_vars <- c("left_1_measure", "left_2_measure" , "left_3_measure")
aff_vars <- c("aff_1_measure", "aff_2_measure", "aff_3_measure")

aff_var_create <- function(x, y, z){
  ifelse(df$laterality == "right" , df[[y]], ifelse(df$laterality == "left", df[[z]], NA))
}

df[aff_vars] <- mapply(FUN=aff_var_create, x=aff_vars, y=right_vars, z=left_vars)

df

Alternatively, assign by indexing with [.

aff_cols <- paste0("aff_", 1:3, "_measure")
right_cols <- paste0("right_", 1:3, "_measure")
left_cols <- paste0("left_", 1:3, "_measure")
curr_logic <- df$laterality == "right"

# INITIALIZE COLUMNS
df[aff_cols] <- NA

# UPDATE COLUMNS BY INDEX
df[curr_logic , aff_cols] <- df[curr_logic , right_cols]
df[!curr_logic , aff_cols] <- df[!curr_logic, left_cols]

df

Even better, use a single ifelse call since it can run vector and matrix comparison aligning to same dimensions (hence, replicate).

aff_cols <- paste0("aff_", 1:3, "_measure")
right_cols <- paste0("right_", 1:3, "_measure")
left_cols <- paste0("left_", 1:3, "_measure")
curr_logic <- df$laterality == "right"

df[aff_cols] <- ifelse(replicate(3, curr_logic), 
                       as.matrix(df[right_cols]), 
                       as.matrix(df[left_cols]))

df

Upvotes: 2

Martin Gal
Martin Gal

Reputation: 16988

It's not a mapply-solution but for this kind of data work I recommend using the tidyverse package or at least parts of it:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(matches("_\\d+_measure"), names_to=c("side", "no"), names_pattern="(\\w+)_(\\d+)_measure") %>% 
  filter(laterality == side) %>% 
  select(-side) %>% 
  pivot_wider(names_from=no, names_glue="aff_{no}_measure") %>% 
  full_join(df, by=c("recordID", "laterality"))

which returns

# A tibble: 4 x 11
  recordID laterality aff_1_measure aff_2_measure aff_3_measure right_1_measure right_2_measure right_3_measure
     <dbl> <chr>              <dbl>         <dbl>         <dbl>           <dbl>           <dbl>           <dbl>
1        1 right                2.3           1.3           2.7             2.3             1.3             2.7
2        2 right                3.4           2.2           2.8             3.4             2.2             2.8
3        3 left                 4.5           3.5           3.6             1.7             3.1             4.2
4        4 right                2.4           4.1           3.9             2.4             4.1             3.9
# ... with 3 more variables: left_1_measure <dbl>, left_2_measure <dbl>, left_3_measure <dbl>

Note: you can easily change the order of your columns so this output matches your desired output.

What did I do?

  • First we bring the data into a "long" format using pivot_longer. This allows us to filter the data for the correct laterality.
  • Now we have to measures to create the aff_n_measure columns using pivot_wider.
  • Finally we combine these new data with your old data using a full_join.

Upvotes: 0

Related Questions