user15791858
user15791858

Reputation: 185

Combining Using Two Columns

A very basic question about merging interchangable columns.

Say I have a table

  subject stim1 stim2 Chosen Tchosen
     <dbl> <int> <int>  <int>   <int>
 1       1     1     2      1       4
 2       1     1     2      2      15
 3       1     1     3      1       2
 4       1     1     3      3      13
 5       1     2     1      1       2
 6       1     2     1      2      13
 7       1     2     3      2       3

where stim1 and stim 2 are interchangable (Stim1=1,Stim2=2 is equivalent to Stim1=2, Stim2=1)

What is the simplest way to merge the data so that Tchosen is added from the two equivalent columns (though Chosen and by subject should be maintained distinctly)

Desired output

   subject stim1 stim2 Chosen Tchosen
     <dbl> <int> <int>  <int>   <int>
 1       1     1     2      1       6
 2       1     1     2      2      28
 3       1     1     3      1       4
 4       1     1     3      3      28
 5       1     2     3      2       3
 6       1     2     3      3      12...

Thank you

Upvotes: 0

Views: 50

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389235

You can use pmin/pmax to sort stim1 and stim2 columns and calculate sum for each group.

aggregate(Tchosen~., transform(df, stim1 = pmin(stim1, stim2), 
                                   stim2 = pmax(stim1, stim2)), sum)

#  subject stim1 stim2 Chosen Tchosen
#1       1     1     2      1       6
#2       1     1     3      1       2
#3       1     1     2      2      28
#4       1     2     3      2       3
#5       1     1     3      3      13

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102625

A base R option using merge + rowSums

transform(
  merge(df,
    df,
    by.x = c("subject", "stim1", "stim2", "Chosen"),
    by.y = c("subject", "stim2", "stim1", "Chosen"),
    all.x = TRUE
  ),
  Tchosen = rowSums(cbind(Tchosen.x, Tchosen.y), na.rm = TRUE)
)

which gives

  subject stim1 stim2 Chosen Tchosen.x Tchosen.y Tchosen
1       1     1     2      1         4         2       6
2       1     1     2      2        15        13      28
3       1     1     3      1         2        NA       2
4       1     1     3      3        13        NA      13
5       1     2     1      1         2         4       6
6       1     2     1      2        13        15      28
7       1     2     3      2         3        NA       3

where NA exists probably due to the incomplete data in your post.

Data

> dput(df)
structure(list(subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L), stim1 = c(1L, 
1L, 1L, 1L, 2L, 2L, 2L), stim2 = c(2L, 2L, 3L, 3L, 1L, 1L, 3L
), Chosen = c(1L, 2L, 1L, 3L, 1L, 2L, 2L), Tchosen = c(4L, 15L,
2L, 13L, 2L, 13L, 3L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7")) 

Upvotes: 1

Wimpel
Wimpel

Reputation: 27802

Here is a data.table approach.. could not reproduce your desired output, since it contains values that are not in your sample data?

library(data.table)
DT <- fread("  subject stim1 stim2 Chosen Tchosen
       1     1     2      1       4
       1     1     2      2      15
       1     1     3      1       2
       1     1     3      3      13
       1     2     1      1       2
       1     2     1      2      13
       1     2     3      2       3")

# Switch values of stim2 and stim1 if stim2 < stim1
DT[stim2 < stim1, `:=`(stim1 = stim2, stim2 = stim1)]
# Now summarise and sum
DT[, .(Tchosen = sum(Tchosen, na.rm = TRUE)), by = .(subject,stim1, stim2, Chosen)]
#    subject stim1 stim2 Chosen Tchosen
# 1:       1     1     2      1       6
# 2:       1     1     2      2      28
# 3:       1     1     3      1       2
# 4:       1     1     3      3      13
# 5:       1     2     3      2       3

Upvotes: 1

Related Questions