Reputation: 50
Having a difficult time describing this which is probably why I'm not certain what function/what I'm looking for. Would appreciate someone describing what this function is called.
Basically, I have about a dozen .csv files, each with lists of a few hundred genes each. None of the lists includes all gene names.
What I'm looking to do, is to merge all those lists together and get a comprehensive list of all genes with an indication of which files said genes show up in. I don't need the values, just 1s and 0s to indicate if those names show up in said file is plenty.
I can already tell this may not make sense, so this analogy may help:
Let's say I have three files, Fruit A, Fruit B, and Fruit C. Fruit A has 2 apples, 3 bananas, and 1 orange. Fruit B has 1 apple and 1 coconut. Fruit C has 2 oranges and a lime. I want to merge it and produce a file that looks like this:
FRUIT NAME | Fruit A | Fruit B | Fruit C |
---|---|---|---|
Apple | 1 | 1 | 0 |
Banana | 1 | 0 | 0 |
Orange | 1 | 0 | 1 |
Coconut | 0 | 1 | 0 |
Lime | 0 | 0 | 1 |
Any advice would be greatly appreciated.
Upvotes: 0
Views: 51
Reputation: 26218
%in%
operator to generate your individual file columnsfruitA <- c('apple', 'banana', 'orange')
fruitB <- c('apple', 'coconut')
fruitC <- c('orange', 'lime')
data.frame(fruitname = unique(c(fruitA, fruitB, fruitC))) %>%
mutate(colA = as.numeric(fruitname %in% fruitA),
colB = as.numeric(fruitname %in% fruitB),
colC = as.numeric(fruitname %in% fruitC))
fruitname colA colB colC
1 apple 1 1 0
2 banana 1 0 0
3 orange 1 0 1
4 coconut 0 1 0
5 lime 0 0 1
You may also make of purrr::reduce2
if you have many items
reduce2(list(fruitA, fruitB, fruitC),
list("fruitA", "fruitB", "fruitC"),
.init = data.frame(fruitname = unique(c(fruitA, fruitB, fruitC))),
~ ..1 %>% mutate(!!..3 := +(fruitname %in% ..2)))
fruitname fruitA fruitB fruitC
1 apple 1 1 0
2 banana 1 0 0
3 orange 1 0 1
4 coconut 0 1 0
5 lime 0 0 1
Upvotes: 0