R - Merging csvs with similar data to show if row names are present in respective files

Question

Having a difficult time describing this which is probably why I'm not certain what function/what I'm looking for. Would appreciate someone describing what this function is called.

Basically, I have about a dozen .csv files, each with lists of a few hundred genes each. None of the lists includes all gene names.

What I'm looking to do, is to merge all those lists together and get a comprehensive list of all genes with an indication of which files said genes show up in. I don't need the values, just 1s and 0s to indicate if those names show up in said file is plenty.

I can already tell this may not make sense, so this analogy may help:

Let's say I have three files, Fruit A, Fruit B, and Fruit C. Fruit A has 2 apples, 3 bananas, and 1 orange. Fruit B has 1 apple and 1 coconut. Fruit C has 2 oranges and a lime. I want to merge it and produce a file that looks like this:

FRUIT NAME	Fruit A	Fruit B	Fruit C
Apple	1	1	0
Banana	1	0	0
Orange	1	0	1
Coconut	0	1	0
Lime	0	0	1

Any advice would be greatly appreciated.

AnilGoyal · Accepted Answer

first convert all three files (.csv) into individual vectors, as you have stated that these are lists only.
then take unique elements out of these lists.
use %in% operator to generate your individual file columns

fruitA <- c('apple', 'banana', 'orange')
fruitB <- c('apple', 'coconut')
fruitC <- c('orange', 'lime')

data.frame(fruitname = unique(c(fruitA, fruitB, fruitC))) %>%
  mutate(colA = as.numeric(fruitname %in% fruitA), 
         colB = as.numeric(fruitname %in% fruitB),
         colC = as.numeric(fruitname %in% fruitC))

  fruitname colA colB colC
1     apple    1    1    0
2    banana    1    0    0
3    orange    1    0    1
4   coconut    0    1    0
5      lime    0    0    1

You may also make of purrr::reduce2 if you have many items

reduce2(list(fruitA, fruitB, fruitC), 
        list("fruitA", "fruitB", "fruitC"),
        .init = data.frame(fruitname = unique(c(fruitA, fruitB, fruitC))),
       ~ ..1 %>% mutate(!!..3 := +(fruitname %in% ..2)))

  fruitname fruitA fruitB fruitC
1     apple      1      1      0
2    banana      1      0      0
3    orange      1      0      1
4   coconut      0      1      0
5      lime      0      0      1

R - Merging csvs with similar data to show if row names are present in respective files

Answers (1)

Related Questions