Reputation: 45
I have a large list of lists. There are 46 lists in "output". Each list is a tibble with differing number of rows and columns. My immediate goal is to subset a specific column from each list.
This is str(output) of the first two lists to give you an idea of the data.
> str(output)
List of 46
$ Brain :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6108 obs. of 8 variables:
..$ p_val : chr [1:6108] "0" "1.60383253411205E-274" "0" "0" ...
..$ avg_diff : num [1:6108] 1.71 1.7 1.68 1.6 1.58 ...
..$ pct.1 : num [1:6108] 0.998 0.808 0.879 0.885 0.923 0.905 0.951 0.957 0.619 0.985 ...
..$ pct.2 : num [1:6108] 0.677 0.227 0.273 0.323 0.36 0.384 0.401 0.444 0.152 0.539 ...
..$ cluster : num [1:6108] 1 1 1 1 1 1 1 1 1 1 ...
..$ gene : chr [1:6108] "Plp1" "Mal" "Ermn" "Stmn4" ...
..$ X__1 : logi [1:6108] NA NA NA NA NA NA ...
..$ Cell Type: chr [1:6108] "Myelinating oligodendrocyte" NA NA NA ...
$ Bladder :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4656 obs. of 8 variables:
..$ p_val : num [1:4656] 0.00 1.17e-233 2.85e-276 0.00 0.00 ...
..$ avg_diff : num [1:4656] 2.41 2.23 2.04 2.01 1.98 ...
..$ pct.1 : num [1:4656] 0.833 0.612 0.855 0.987 1 0.951 0.711 0.544 0.683 0.516 ...
..$ pct.2 : num [1:4656] 0.074 0.048 0.191 0.373 0.906 0.217 0.105 0.044 0.177 0.106 ...
..$ cluster : num [1:4656] 1 1 1 1 1 1 1 1 1 1 ...
..$ gene : chr [1:4656] "Dpt" "Gas1" "Cxcl12" "Lum" ...
..$ X__1 : logi [1:4656] NA NA NA NA NA NA ...
..$ Cell Type: chr [1:4656] "Stromal cell_Dpt high" NA NA NA ...
Since I have a large number of lists that make up the list, I have been trying to create an iterative code to perform tasks. This hasn't been successful.
I can achieve this manually, or list by list, but I haven't been successful in finding an iterative way of doing this.
x <- data.frame(output$Brain, stringsAsFactors = FALSE)
tmp.list <- x$Cell.Type
tmp.output <- purrr::discard(tmp.list, is.na)
x <- subset(x, Cell.Type %in% tmp.output)
This gives me the output that I want, which are the rows in the column "Cell.Type" with non-NA values.
I got as far as the code below to get the 8th column of each list, which is the "Cell.Type" column.
lapply(output, "[", , 8))
But here I found that the naming and positioning of the "Cell.Type" column in each list is not consistent. This means I cannot use the lapply function to subset the 8th columns, as some lists have this on for example the 9th column.
I tried the code below, but it does not work and gets an error.
lapply(output, "[", , c('Cell.Type', 'celltyppe'))
#Error: Column `celltyppe` not found
#Call `rlang::last_error()` to see a backtrace
Essentially, from my "output" list, I want to subset either columns "Cell.Type" or "celltyppe" from each of the 46 lists to create a new list with 46 lists of just a single column of values. Then I want to drop all rows with NA.
I would like to perform this using some sort of loop.
At the moment I have not had much success. Lapply seems to be able to extract columns through lists iterately, and I am having difficultly trying to subset names columns.
Once I can do this, I then want to create a loop that can subset only rows without NA.
FINAL CODE
This is the final code I have used to create exactly what I had hoped for. The first line of the code specifies the loop to go through each list of the large list. The second line of code selects columns of each list that contains "ell" in its name (Cell type, Cell Type, or celltyppe). The last removes any rows with "na".
purrr::map(output, ~ .x %>%
dplyr::select(matches("ell")) %>%
na.omit)
Upvotes: 1
Views: 63
Reputation: 887531
We can use anonymous function call
lapply(output, function(x) na.omit(x[grep("(?i)Cell\\.?(?i)Typp?e", names(x))]))
#[[1]]
# Cell.Type
#1 1
#2 2
#3 3
#4 4
#5 5
#[[2]]
# celltyppe
#1 7
#2 8
#3 9
#4 10
#5 11
Also with purrr
library(tidyverse)
map(output, ~ .x %>%
select(matches("(?i)Cell\\.?(?i)Typp?e") %>%
na.omit))
output <- list(data.frame(Cell.Type = 1:5, col1 = 6:10, col2 = 11:15),
data.frame(coln = 1:5, celltyppe = 7:11))
Upvotes: 1