mikephel
mikephel

Reputation: 50

R - Merging csvs with similar data to show if row names are present in respective files

Having a difficult time describing this which is probably why I'm not certain what function/what I'm looking for. Would appreciate someone describing what this function is called.

Basically, I have about a dozen .csv files, each with lists of a few hundred genes each. None of the lists includes all gene names.

What I'm looking to do, is to merge all those lists together and get a comprehensive list of all genes with an indication of which files said genes show up in. I don't need the values, just 1s and 0s to indicate if those names show up in said file is plenty.

I can already tell this may not make sense, so this analogy may help:

Let's say I have three files, Fruit A, Fruit B, and Fruit C. Fruit A has 2 apples, 3 bananas, and 1 orange. Fruit B has 1 apple and 1 coconut. Fruit C has 2 oranges and a lime. I want to merge it and produce a file that looks like this:

FRUIT NAME Fruit A Fruit B Fruit C
Apple 1 1 0
Banana 1 0 0
Orange 1 0 1
Coconut 0 1 0
Lime 0 0 1

Any advice would be greatly appreciated.

Upvotes: 0

Views: 51

Answers (1)

AnilGoyal
AnilGoyal

Reputation: 26218

  • first convert all three files (.csv) into individual vectors, as you have stated that these are lists only.
  • then take unique elements out of these lists.
  • use %in% operator to generate your individual file columns
fruitA <- c('apple', 'banana', 'orange')
fruitB <- c('apple', 'coconut')
fruitC <- c('orange', 'lime')

data.frame(fruitname = unique(c(fruitA, fruitB, fruitC))) %>%
  mutate(colA = as.numeric(fruitname %in% fruitA), 
         colB = as.numeric(fruitname %in% fruitB),
         colC = as.numeric(fruitname %in% fruitC))

  fruitname colA colB colC
1     apple    1    1    0
2    banana    1    0    0
3    orange    1    0    1
4   coconut    0    1    0
5      lime    0    0    1

You may also make of purrr::reduce2 if you have many items

reduce2(list(fruitA, fruitB, fruitC), 
        list("fruitA", "fruitB", "fruitC"),
        .init = data.frame(fruitname = unique(c(fruitA, fruitB, fruitC))),
       ~ ..1 %>% mutate(!!..3 := +(fruitname %in% ..2)))

  fruitname fruitA fruitB fruitC
1     apple      1      1      0
2    banana      1      0      0
3    orange      1      0      1
4   coconut      0      1      0
5      lime      0      0      1

Upvotes: 0

Related Questions