syhwyqp
syhwyqp

Reputation: 77

Multiply columns by columns using substrings

I'm relatively new to R and was struggling with potentially a very simple problem.

I have data that has multiple columns named in a similar way. Here is a sample data:

df = data.frame(PPID = 1:50, 
                time1 = sample(c(0,1), 50, replace = TRUE),
                time2 = sample(c(0,1), 50, replace = TRUE),
                time3 = sample(c(0,1), 50, replace = TRUE),
                condition1 = sample(c(0:3), 50, replace = TRUE),
                condition2 = sample(c(0:3), 50, replace = TRUE))

In my actual data, I have much more columns - approximately 50 for time and 10 for condition.

I want to multiply week columns and condition columns, e.g. in that sample data it should give me 6 extra columns, like: time1_condition1, time1_condition2, time2_condition1, time2_condition2, time3_condition1, time3_condition2.

I tried solutions that were suggested in this thread but they did not work (presumably because I didn't understand how mapply/apply worked and did not make appropriate changes) - it gave me error message that the longer argument is not a multiple of length of shorter.

Any help would be greatly appreciated!

Upvotes: 1

Views: 107

Answers (3)

Maurits Evers
Maurits Evers

Reputation: 50738

Here is a tidyverse alternative

library(tidyverse)
idx.time <- grep("time", names(df), value = T)
idx.cond <- grep("condition", names(df), value = T)
bind_cols(
    df,
    map_dfc(transpose(expand.grid(idx.time, idx.cond, stringsAsFactors = F)),
        ~setNames(data.frame(df[, .x$Var1] * df[, .x$Var2]), paste(.x$Var1, .x$Var2, sep = "_"))))
#   PPID time1 time2 time3 condition1 condition2 time1_condition1
#1     1     1     0     1          3          0                3
#2     2     0     1     1          0          1                0
#3     3     0     1     1          0          2                0
#4     4     0     0     1          0          3                0
#5     5     0     0     0          0          3                0
#...

Explanation: expand.grid creates all pairwise combinations of idx.time and idx.cond. transpose turns a list/data.frame inside-out and returns a list, similar to apply(..., 1, as.list); map_dfc then operates on every element of that list and column-binds results.

Upvotes: 2

Onyambu
Onyambu

Reputation: 79348

Using

library(tidyverse)

a = df[grep("time",names(df))]
b = df[grep("condition",names(df))]

we can do:

 map(a,~.x*b)%>%
   bind_cols()%>%
   set_names(paste(rep(names(a),each=ncol(b)),names(b),sep="_"))

or we can

cross2(a,b)%>%
  map(lift(`*`))%>%
  set_names(paste(rep(names(a),each=ncol(b)),names(b),sep="_"))%>%
  data.frame()

   time1_condition1 time2_condition1 time3_condition1 time1_condition2 time2_condition2 time3_condition2
1                 3                0                3                2                0                2
2                 3                3                0                1                1                0
3                 0                0                0                0                0                0
4                 3                3                0                0                0                0
5                 0                0                2                0                0                1
6                 0                0                1                0                0                1
7                 2                2                0                0                0                0

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389325

#Get all the columns with "time" columns
time_cols <- grep("^time", names(df))

#Get all the columns with "condition" column
condition_cols <- grep("^condition", names(df))

#Multiply each "time" columns with all the condition columns
# and creating a new dataframe
new_df <- do.call("cbind", lapply(df[time_cols] , function(x) x * 
                                df[condition_cols]))

#Combine both the dataframes
complete_df <- cbind(df,new_df)

We can also generate column names using expand.grid

new_names <- do.call("paste0", 
        expand.grid(names(df)[condition_cols], names(df)[time_cols]))
colnames(complete_df)[7:12] <- new_names

Upvotes: 2

Related Questions