Reputation: 31

R randomly swap values between two columns in dataframe

I repeated an experiment (rep1 and rep2). For each replicate I have two columns (a, sum) and two rows of the tested subjects that belong together (group AA, BB...). For analysis, I would like to randomly assign the collected data (a and sum) to rep1 and rep2. For this, I was trying to randomly select groups and swap "a" and "sum" of rep1 and rep2. I am trying to repeat the random swapping 100 times, creating 100 datasets for analysis.

I came across unique(df$groups) to specify that the data of each group belongs together. Combined to sample(unique(df$group), 2) it randomly samples, let's say, 2 groups. But I don't know how to swap the data of the replicates of these selected groups.

Here is an example of the data:

group = c("A", "A", "B", "B", "C", "C")
rep1_a = c(2, 8, 5, 5, 4, 6)
rep1_sum = c(10, 10, 10, 10, 10, 10)
rep2_a = c(3, 8, 4, 5, 5, 6)
rep2_sum = c(11, 11, 9, 9, 11, 11)
df = data.frame(group, rep1_a, rep1_sum, rep2_a, rep2_sum)

#    group    rep1_a     rep1_sum     rep2_a   rep2_sum
1     A          2         10          3         11
2     A          8         10          8         11
3     B          5         10          4          9
4     B          5         10          5          9
5     C          4         10          5         11
6     C          6         10          6         11

And here is what it should look like, if out of these 3 groups, the replicates of group A are swapped:

    group     rep1_a    rep1_sum    rep2_a    rep2_sum
1     A          3         11          2         10
2     A          8         11          8         10
3     B          5         10          4          9
4     B          5         10          5          9
5     C          4         10          5         11
6     C          6         10          6         11

Upvotes: 3

Answers (2)

thelatemail

Reputation: 93803

A data.table version:

library(data.table)
setDT(df)
df[,swap := sample(c(TRUE,FALSE),1), by=group]
rbind(
 df[(!swap)],
 df[(swap), setNames(.group,rep2_a,rep2_sum,rep1_a,rep1_sum,swap),names(df)) ]
)[order(group)]

It just swaps the columns if the swap variable returns FALSE, otherwise the set of rows in the group is returned unchanged.

Upvotes: 0

Julius

Reputation: 287

Here's one way of doing it with dplyr. The following code repeats creating the new data set with equal mixture of rep1 and rep2 by group, and doing desired analysis on the data set 100 times.

library(dplyr)   
exp_data <- data_frame()
analysis_result <- data_frame()
for (i in 1:100){
# Your new 'experiment' by mixing two real experiment randomly, indicated by 'exp_id'

  new_df <- df %>%
    group_by(group) %>%
    mutate(x = runif(1)) %>%
    mutate(repr_a = ifelse(x>0.5,rep1_a,rep2_a), repr_sum =  ifelse(x>0.5,rep1_sum,rep2_sum),exp_id=i) %>%
    select(exp_id,group,repr_a,repr_sum)
  # Your analysis - below is my example
  new_analysis <- new_df %>%
    group_by(exp_id,group) %>%
    summarise(outcome = mean(repr_a*repr_sum))
  exp_data <- bind_rows(exp_data,new_df)
  analysis_result <- bind_rows(analysis_result,new_analysis)
}

Upvotes: 1

R randomly swap values between two columns in dataframe

Answers (2)

Related Questions