Reputation: 469
I am comparing two data frames: FU and FO Here are short samples of what they look like
"Model_ID" "FU_Lin_Period" "FU_Growth_rate"
2 0.72127 0.0093333
3 0.69281 0.015857
4 0.66735 0.021103
5 0.64414 0.024205
6 0.62288 0.026568
7 0.60318 0.027749
8 0.58472 0.028161
9 0.56734 0.028008
10 0.55085 0.027309
11 0.53522 0.026068
12 0.52029 0.024684
13 0.50603 0.022866
14 0.49237 0.020991
15 0.47928 0.018773
"Model_ID" "FO_Lin_Period" "FO_Growth_rate"
7 0.44398 0.008868
8 0.43114 0.01674
9 0.41896 0.023248
10 0.40728 0.028641
11 0.39615 0.032192
12 0.38543 0.03543
13 0.37517 0.03692
14 0.36525 0.038427
15 0.35573 0.038195
As you can tell, they do not have all the same Model_ID
Basically, what I want to do is go through every Model_ID
in the two tables, compare whether FU or FO's growth rate is larger for a given model ID, and...
selected_FU
selected_FO
Is there a way to do this without using loops?
Upvotes: 1
Views: 54
Reputation: 93813
data.table alternative using similar logic to the tidyverse answer.
Replace NA
s with -Inf
inity, do the comparison of the two FU/FO_Growth_rate
variables, flag which group had the larger value, and select the Model_ID
into the variables requested.
library(data.table)
setDT(FU)
setDT(FO)
out <- merge(FU, FO, by="Model_ID", all=TRUE)[,
"gr_sel" := c("FO","FU")[(nafill(FU_Growth_rate, fill=-Inf) >
nafill(FO_Growth_rate, fill=-Inf)) + 1],
]
selected_FU <- out[gr_sel == "FU", Model_ID]
selected_FO <- out[gr_sel == "FO", Model_ID]
Data used:
FU <- read.table(text="Model_ID FU_Lin_Period FU_Growth_rate\n2 0.72127 0.0093333\n3 0.69281 0.015857\n4 0.66735 0.021103\n5 0.64414 0.024205\n6 0.62288 0.026568\n7 0.60318 0.027749\n8 0.58472 0.028161\n9 0.56734 0.028008\n10 0.55085 0.027309\n11 0.53522 0.026068\n12 0.52029 0.024684\n13 0.50603 0.022866\n14 0.49237 0.020991\n15 0.47928 0.018773", header=TRUE)
FO <- read.table(text="Model_ID FO_Lin_Period FO_Growth_rate\n7 0.44398 0.008868\n8 0.43114 0.01674\n9 0.41896 0.023248\n10 0.40728 0.028641\n11 0.39615 0.032192\n12 0.38543 0.03543\n13 0.37517 0.03692\n14 0.36525 0.038427\n15 0.35573 0.038195", header=TRUE)
Upvotes: 1
Reputation: 5898
With dplyr
, tidyr
, and reader
.
library(dplyr)
library(tidyr)
library(readr)
FU <- read_table2("test.FU.LINA.table")
FO <- read_table2("test.FO.LINA.table")
df_compared <-
full_join(FU, FO, by = "model_id") %>%
replace_na(list(fo_growth_rate = -1, fu_growth_rate = -1)) %>%
mutate(select_fufo = if_else(fu_growth_rate >= fo_growth_rate, true = "fu", false = "fo"))
df_compared
# A tibble: 6,166 x 6
model_id fu_lin_period fu_growth_rate fo_lin_period fo_growth_rate select_fufo
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 2 0.721 0.00933 NA -1 fu
2 3 0.693 0.0159 NA -1 fu
3 4 0.667 0.0211 NA -1 fu
4 5 0.644 0.0242 NA -1 fu
5 6 0.623 0.0266 NA -1 fu
6 7 0.603 0.0277 0.444 0.00887 fu
7 8 0.585 0.0282 0.431 0.0167 fu
8 9 0.567 0.0280 0.419 0.0232 fu
9 10 0.551 0.0273 0.407 0.0286 fo
10 11 0.535 0.0261 0.396 0.0322 fo
# ... with 6,156 more rows
selected_fu <- df_compared %>% filter(select_fufo == "fu") %>% .$model_id
selected_fo <- df_compared %>% filter(select_fufo == "fo") %>% .$model_id
Upvotes: 0