claraina
claraina

Reputation: 63

Convert unequal length columns into named list in R

I currently have a dataframe with columns of unequal length called 'gene_clusters' like so

> gene_clusters

  hi    lo    mid
  gene1 gene5 gene3
  gene2 gene8 gene9
  gene7 NA    gene10
  NA    NA    gene4

I've imported it from excel, so I think R has tried to compensate for the unequal lengths by filling them with NAs. Essentially, I want to create a named list like this:

>GeneList
## $hi
## [1] "gene1" "gene2" "gene7"  
## 
## $lo
## [1] "gene5" "gene8"  
## 
## $mid
## [1] "gene3" "gene9" "gene10" "gene4"   

(This is just an example of my data, but in reality I have > 100 genes)

Upvotes: 1

Views: 31

Answers (3)

akrun
akrun

Reputation: 887168

We can do this with stack/split from base R

with(na.omit(stack(df)[2:1]), split(values, ind))
#$hi
#[1] "gene1" "gene2" "gene7"

#$lo
#[1] "gene5" "gene8"

#$mid
#[1] "gene3"  "gene9"  "gene10" "gene4" 

Or with lapply

lapply(df, function(x) x[complete.cases(x)])

data

df <- structure(list(hi = c("gene1", "gene2", "gene7", NA), 
lo = c("gene5", 
"gene8", NA, NA), mid = c("gene3", "gene9", "gene10", "gene4"
)), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

Karthik S
Karthik S

Reputation: 11584

Using map

> map(df,~ .[!is.na(.)])
$hi
[1] "gene1" "gene2" "gene7"

$lo
[1] "gene5" "gene8"

$mid
[1] "gene3"  "gene9"  "gene10" "gene4" 

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

Use na.omit to drop all the NA values but na.omit returns lot of attributes which you can delete.

sapply(df, function(x) {x1 <- na.omit(x);attributes(x1) <- NULL;x1})

#$hi
#[1] "gene1" "gene2" "gene7"

#$lo
#[1] "gene5" "gene8"

#$mid
#[1] "gene3"  "gene9"  "gene10" "gene4" 

If your column names don't have numbers as shared example you can also use split :

tmp <- na.omit(unlist(df))
split(tmp, gsub('\\d+', '', names(tmp)))

Upvotes: 2

Related Questions