Reputation: 63
I currently have a dataframe with columns of unequal length called 'gene_clusters' like so
> gene_clusters
hi lo mid
gene1 gene5 gene3
gene2 gene8 gene9
gene7 NA gene10
NA NA gene4
I've imported it from excel, so I think R has tried to compensate for the unequal lengths by filling them with NAs. Essentially, I want to create a named list like this:
>GeneList
## $hi
## [1] "gene1" "gene2" "gene7"
##
## $lo
## [1] "gene5" "gene8"
##
## $mid
## [1] "gene3" "gene9" "gene10" "gene4"
(This is just an example of my data, but in reality I have > 100 genes)
Upvotes: 1
Views: 31
Reputation: 887168
We can do this with stack/split
from base R
with(na.omit(stack(df)[2:1]), split(values, ind))
#$hi
#[1] "gene1" "gene2" "gene7"
#$lo
#[1] "gene5" "gene8"
#$mid
#[1] "gene3" "gene9" "gene10" "gene4"
Or with lapply
lapply(df, function(x) x[complete.cases(x)])
df <- structure(list(hi = c("gene1", "gene2", "gene7", NA),
lo = c("gene5",
"gene8", NA, NA), mid = c("gene3", "gene9", "gene10", "gene4"
)), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 1
Reputation: 11584
Using map
> map(df,~ .[!is.na(.)])
$hi
[1] "gene1" "gene2" "gene7"
$lo
[1] "gene5" "gene8"
$mid
[1] "gene3" "gene9" "gene10" "gene4"
Upvotes: 0
Reputation: 388982
Use na.omit
to drop all the NA
values but na.omit
returns lot of attributes which you can delete.
sapply(df, function(x) {x1 <- na.omit(x);attributes(x1) <- NULL;x1})
#$hi
#[1] "gene1" "gene2" "gene7"
#$lo
#[1] "gene5" "gene8"
#$mid
#[1] "gene3" "gene9" "gene10" "gene4"
If your column names don't have numbers as shared example you can also use split
:
tmp <- na.omit(unlist(df))
split(tmp, gsub('\\d+', '', names(tmp)))
Upvotes: 2