Woj
Woj

Reputation: 469

R: How to simply compare values of columns in 2 data frames

I am comparing two data frames: FU and FO Here are short samples of what they look like

"Model_ID" "FU_Lin_Period" "FU_Growth_rate"
2 0.72127 0.0093333
3 0.69281 0.015857
4 0.66735 0.021103
5 0.64414 0.024205
6 0.62288 0.026568
7 0.60318 0.027749
8 0.58472 0.028161
9 0.56734 0.028008
10 0.55085 0.027309
11 0.53522 0.026068
12 0.52029 0.024684
13 0.50603 0.022866
14 0.49237 0.020991
15 0.47928 0.018773
"Model_ID" "FO_Lin_Period" "FO_Growth_rate"
7 0.44398 0.008868
8 0.43114 0.01674
9 0.41896 0.023248
10 0.40728 0.028641
11 0.39615 0.032192
12 0.38543 0.03543
13 0.37517 0.03692
14 0.36525 0.038427
15 0.35573 0.038195

As you can tell, they do not have all the same Model_ID

Basically, what I want to do is go through every Model_ID in the two tables, compare whether FU or FO's growth rate is larger for a given model ID, and...

Is there a way to do this without using loops?

Upvotes: 1

Views: 54

Answers (2)

thelatemail
thelatemail

Reputation: 93813

data.table alternative using similar logic to the tidyverse answer.

Replace NAs with -Infinity, do the comparison of the two FU/FO_Growth_rate variables, flag which group had the larger value, and select the Model_ID into the variables requested.

library(data.table)
setDT(FU)
setDT(FO)

out <- merge(FU, FO, by="Model_ID", all=TRUE)[,
    "gr_sel" := c("FO","FU")[(nafill(FU_Growth_rate, fill=-Inf) >
                              nafill(FO_Growth_rate, fill=-Inf)) + 1],
]
selected_FU <- out[gr_sel == "FU", Model_ID]
selected_FO <- out[gr_sel == "FO", Model_ID]

Data used:

FU <- read.table(text="Model_ID FU_Lin_Period FU_Growth_rate\n2 0.72127 0.0093333\n3 0.69281 0.015857\n4 0.66735 0.021103\n5 0.64414 0.024205\n6 0.62288 0.026568\n7 0.60318 0.027749\n8 0.58472 0.028161\n9 0.56734 0.028008\n10 0.55085 0.027309\n11 0.53522 0.026068\n12 0.52029 0.024684\n13 0.50603 0.022866\n14 0.49237 0.020991\n15 0.47928 0.018773", header=TRUE)
FO <- read.table(text="Model_ID FO_Lin_Period FO_Growth_rate\n7 0.44398 0.008868\n8 0.43114 0.01674\n9 0.41896 0.023248\n10 0.40728 0.028641\n11 0.39615 0.032192\n12 0.38543 0.03543\n13 0.37517 0.03692\n14 0.36525 0.038427\n15 0.35573 0.038195", header=TRUE)

Upvotes: 1

Nicol&#225;s Velasquez
Nicol&#225;s Velasquez

Reputation: 5898

With dplyr, tidyr, and reader.

library(dplyr)
library(tidyr)
library(readr)

FU <- read_table2("test.FU.LINA.table")
FO <- read_table2("test.FO.LINA.table")

df_compared <- 
  full_join(FU, FO, by = "model_id") %>% 
  replace_na(list(fo_growth_rate = -1, fu_growth_rate = -1)) %>%
  mutate(select_fufo = if_else(fu_growth_rate >= fo_growth_rate, true = "fu", false = "fo"))

df_compared
# A tibble: 6,166 x 6
   model_id fu_lin_period fu_growth_rate fo_lin_period fo_growth_rate select_fufo
      <dbl>         <dbl>          <dbl>         <dbl>          <dbl> <chr>      
 1        2         0.721        0.00933        NA           -1       fu         
 2        3         0.693        0.0159         NA           -1       fu         
 3        4         0.667        0.0211         NA           -1       fu         
 4        5         0.644        0.0242         NA           -1       fu         
 5        6         0.623        0.0266         NA           -1       fu         
 6        7         0.603        0.0277          0.444        0.00887 fu         
 7        8         0.585        0.0282          0.431        0.0167  fu         
 8        9         0.567        0.0280          0.419        0.0232  fu         
 9       10         0.551        0.0273          0.407        0.0286  fo         
10       11         0.535        0.0261          0.396        0.0322  fo         
# ... with 6,156 more rows

selected_fu <- df_compared %>% filter(select_fufo == "fu") %>% .$model_id
selected_fo <- df_compared %>% filter(select_fufo == "fo") %>% .$model_id

Upvotes: 0

Related Questions