Reputation: 279
I need to extract all values from all columns in a dataframe based on a pattern (all values with 'EXO' pattern) and reorder it accordingly.
I tried to use a traditional R function and apply, but I get an error related to the different number of rows. I think it would be easier to use a dplyr or tidyr function.
I tried:
df2 = as.data.frame(apply(df,2,function (x) x[grepl("^EXO",x)]))
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 2, 1, 7
df <- data.frame(
Group1 = c("EMX1", "EXO_C3L4", "FAF2P1", "FAM224A","NA",
"FAM43A", "FAT4", "EXO_FEZF1-AS1"),
Group2 = c("EXO_BRPF3", "NA", "NA", "CCDC187",
"CCDC200", "CCDC7", "CCL27", "CD6"),
Group3 = c("SNORD114-18", "SNORD115-10", "SPATA31B1P", "SPIC",
"NA", "EXO_TATDN2P3", "EXO_TCIM", "TEPP"),
Group4 = c("EXO_SATB2-AS1", "NA", "EXO_SFTA3", "EXO_SIX3-AS1", "EXO_SMIM2-IT1",
"EXO_SNORA46", "EXO_SNORD101", "EXO_SNORD114-18") )
> df
Group1 Group2 Group3 Group4
1 EMX1 EXO_BRPF3 SNORD114-18 EXO_SATB2-AS1
2 EXO_C3L4 NA SNORD115-10 NA
3 FAF2P1 NA SPATA31B1P EXO_SFTA3
4 FAM224A CCDC187 SPIC EXO_SIX3-AS1
5 NA CCDC200 NA EXO_SMIM2-IT1
6 FAM43A CCDC7 EXO_TATDN2P3 EXO_SNORA46
7 FAT4 CCL27 EXO_TCIM EXO_SNORD101
8 EXO_FEZF1-AS1 CD6 TEPP EXO_SNORD114-18
> df2
Group1 Group2 Group3 Group4
1 NA EXO_BRPF3 NA EXO_SATB2-AS1
2 EXO_C3L4 NA NA NA
3 NA NA NA EXO_SFTA3
4 NA NA NA EXO_SIX3-AS1
5 NA NA NA EXO_SMIM2-IT1
6 NA NA EXO_TATDN2P3 EXO_SNORA46
7 NA NA EXO_TCIM EXO_SNORD101
8 EXO_FEZF1-AS1 NA NA EXO_SNORD114-18
#And after that, reorder each columns alphabetically:
> df3
Group1 Group2 Group3 Group4
1 EXO_C3L4 EXO_BRPF3 EXO_TATDN2P3 EXO_SATB2-AS1
2 EXO_FEZF1-AS1 NA EXO_TCIM EXO_SFTA3
3 NA NA NA EXO_SIX3-AS1
4 NA NA NA EXO_SMIM2-IT1
5 NA NA NA EXO_SNORA46
6 NA NA NA EXO_SNORD101
7 NA NA NA EXO_SNORD114-18
8 NA NA NA NA
Upvotes: 1
Views: 109
Reputation: 886958
Looks like the ordering should be independent. We replace
the column values that are not having 'EXO' substring to NA
and then do an order
library(dplyr)
library(stringr)
df %>%
mutate_all(~ replace(., !str_detect(., "EXO"), NA_character_) %>%
{.[order(.)]})
# Group1 Group2 Group3 Group4
#1 EXO_C3L4 EXO_BRPF3 EXO_TATDN2P3 EXO_SATB2-AS1
#2 EXO_FEZF1-AS1 <NA> EXO_TCIM EXO_SFTA3
#3 <NA> <NA> <NA> EXO_SIX3-AS1
#4 <NA> <NA> <NA> EXO_SMIM2-IT1
#5 <NA> <NA> <NA> EXO_SNORA46
#6 <NA> <NA> <NA> EXO_SNORD101
#7 <NA> <NA> <NA> EXO_SNORD114-18
#8 <NA> <NA> <NA> <NA>
Upvotes: 1