smicaela
smicaela

Reputation: 109

R - Compare two columns from different datasets and get new dataset

I have two datasets. For each dataset, every two columns is a pair with *a as 1st column and *b as 2nd column.

E.g. df1, pair A1, df1$A1a = 1st column & df1$A1b = 2nd column.

Likewise, df2, pair B1, df2$B1a = 1st column & df2$B1b = 2nd column.

df1:

ID A1a A1b A2a A2b
1  2   3   2   3
2  3   1   2   1 
3  1   3   1   2
4  2   2   3   3
5  1   2   2   1 

df2:

ID B1a B1b B2a B2b
1  1   2   2   3
2  3   2   1   1 
3  2   3   2   2
4  3   2   2   3
5  2   2   3   1  

The final data (df3) should look like this:

ID C1a C1b C2a C2b
1  1   2   2   3
2  3   1   1   1 
3  1   3   1   2
4  2   2   2   3
5  1   2   2   1  

I would like to do the following:

  1. First, compare the 1st column of each pair between df1 and df2 and identify the lowest value. E.g. For ID=1, compare df1$A1a = 2 with df2$B1a = 1, since df2$B1a has the lower value, mutate new columns with the pairs from df2. I.e. df3$C1a = 1, df3$C1b = 2.
  2. If the 1st column of each pair is the same, then use 2nd column to determine which pairs of values to mutate new columns. E.g. for ID=2, 1st column shows df1$A1a = 3 and df2$B1a = 3, therefore use 2nd column to determine, since df1$A1b = 1 and df2$B1b = 2, the pairs of values should come from df1. I.e. df3$C1a = 3 and df3$C1b = 1.
  3. If both the pairs from df1 and df2 are the same, just use those values. E.g. for ID=1, 1st column df1$A2a = 2 and df2$B2a = 2 are the same, and 2nd column df1$A2b = 3 and df2$B2b = 3 are the same, then new columns should be df3$C1a = 2 and df3$C1b = 3.

Hoping to automate the above so that the code automatically compares every pair from df1 with df2 so that I do not need to compare the pairs individually (e.g. do A1 and B1 first, then do A2 and B2, etc) but rather the code just repeats for every pair in the datasets. Thank you for any help!

Upvotes: 0

Views: 628

Answers (1)

yh6
yh6

Reputation: 389

This is a bit too lengthy, but still it does the trick.

map2(
  list(df1, df2), c("A", "B"),
  function(df, df_chr){
    df %>% pivot_longer(cols=-ID, values_to=df_chr) %>%
      mutate(name=str_replace(name, df_chr, "")) %>% return()
  }
) %>% reduce(left_join, by=c("ID", "name")) %>%
  mutate(name=name %>% str_split(""), id1=map_chr(name, ~ .[[1]]),
         id2=map_chr(name, ~ .[[2]]), .after=ID) %>%
  select(-name) %>% nest(data=-c(ID, id1)) %>%
  mutate(data=map(data, function(data){
    if((data %>% slice(1) %>% .$A) != (data %>% slice(1) %>% .$B)){
      min_col_num <- (data %>% slice(1) %>% select(-id2) %>% which.min() %>% unname()) + 1
      data %>% select(id2, value=min_col_num) %>% return()
    }else{
      min_col_num <- (data %>% slice(2) %>% select(-id2) %>% which.min() %>% unname()) + 1
      data %>% select(id2, value=min_col_num) %>% return()
    }
  })) %>% unnest(cols=data) %>% mutate(name=str_c("C", id1, id2), .after=ID) %>%
  select(-c(id1, id2)) %>% pivot_wider()

Upvotes: 1

Related Questions