Reputation: 85
I have a two dataframe. first data frame has only one column and the second dataframe has 7 columns and 30000 rows. Dataframe 1:
coulmn1
otu_1
otu_2
otu_3
otu_4
otu_5
otu_6
otu_7
otu_8
otu_9
otu_10
Second dataframe:
otu1 otu2 Name.x Name.y
otu_1 otu_2 Gemmiger Bacteroides
OTU_3 otu_1 Bifido Gemmiger
otu_4 otu_5 Fusobacterium fags
otu_6 otu_7 Dialister gems
otu_8 otu_9 Streptococcus hen
OTU_10 OTU_6 Clostridium IV Dialister
OTU_11 OTU_16 Clostridium IV Dialister
Now, I have to compare Dataframe 1 with the two columns (otu1 and otu2) in dataframe 2 and get their corresponding values in Name.x and Name.y (names are not unique for each id).i.e several otu ids may have the same name. But the otu ids are unique
The desire output will be:
coulmn1 Name
otu_1 Gemmiger
otu_2 Bacteroides
otu_3 Bifido
otu_4 Fusobacterium
otu_5 fags
otu_6 Dialister
otu_7 gems
otu_8 Streptococcus
otu_9 hen
otu_10 Clostridium IV
otu_11 Clostridium IV
otu_16 Dialister
Upvotes: 0
Views: 177
Reputation: 23574
Here is my attempt. I converted the original data to a long-format data. Then, I defined a group and replaced the 2nd element in otu1 in each group with the 2nd element in otu2 in the same group. Removing two columns (i.e., otu2 and name), I obtained unique values in the end.
pivot_longer(data = df2, cols = starts_with("Name"),
values_to = "Name") %>%
group_by(otu2) %>%
mutate(otu1 = tolower(if_else(row_number() == n(), otu2, otu1))) %>%
ungroup %>%
select(-c(otu2, name)) %>%
distinct(.keep_all = TRUE) %>%
rename(col1 = "otu1")
col1 Name
<chr> <chr>
1 otu_1 Gemmiger
2 otu_2 Bacteroides
3 otu_3 Bifido
4 otu_4 Fusobacterium
5 otu_5 fags
6 otu_6 Dialister
7 otu_7 gems
8 otu_8 Streptococcus
9 otu_9 hen
10 otu_10 Clostridium IV
11 otu_11 Clostridium IV
12 otu_16 Dialister
Data
df2 <- structure(list(otu1 = c("otu_1", "OTU_3", "otu_4", "otu_6", "otu_8",
"OTU_10", "OTU_11"), otu2 = c("otu_2", "otu_1", "otu_5", "otu_7",
"otu_9", "OTU_6", "OTU_16"), Name.x = c("Gemmiger", "Bifido",
"Fusobacterium", "Dialister", "Streptococcus", "Clostridium IV",
"Clostridium IV"), Name.y = c("Bacteroides", "Gemmiger", "fags",
"gems", "hen", "Dialister", "Dialister")), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 1