Reputation: 175
I would like to split the following data frame based on the final numbers of each element. So I would like 6 new data frames each with two elements. Here is my attempt at obtaining a data frame of the first subset containing just "ABCD-1" and "ABCC-1", but it doesn't seem to be working.
library("reshape2")
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3",
"ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)
bar_f
bar_f$SampleID <- colsplit(bar_f$Barcode, pattern = "-", names = c("a","b"))$b
bar_f.s1 <- subset(barcode_file, barcode_file$SampleID == "1")
bar_f.s1
Can you help?
Thank you,
Abigail
Upvotes: 1
Views: 725
Reputation: 6426
The main idea is to create a factor used to define the grouping for splitting. One way is by extracting the digits pattern form the provided variable Barcode
using regular expression. Then we convert the obtained character vector of digits to a factor with as.factor()
.
We can, of course, use other regular expression techniques to get the job done, or more user friendly wrapper functions from the stringr
package, like in the second example (the tidyverse
-ish approach).
A base R solution using split
:
# The provided data
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3",
"ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)
factor_for_split <- regmatches(x = bar_f$Barcode,
m = regexpr(pattern = "[[:digit:]]",
text = bar_f$Barcode))
factor_for_split
#> [1] "1" "1" "2" "2" "3" "3" "4" "4" "5" "5" "6" "6"
# Create a list of 6 data frames as asked
lst <- split(x = bar_f, f = as.factor(factor_for_split))
lst
#> $`1`
#> Barcode
#> 1 ABCD-1
#> 2 ABCC-1
#>
#> $`2`
#> Barcode
#> 3 ABCD-2
#> 4 ABCC-2
#>
#> $`3`
#> Barcode
#> 5 ABCD-3
#> 6 ABCC-3
#>
#> $`4`
#> Barcode
#> 7 ABCD-4
#> 8 ABCC-4
#>
#> $`5`
#> Barcode
#> 9 ABCD-5
#> 10 ABCC-5
#>
#> $`6`
#> Barcode
#> 11 ABCD-6
#> 12 ABCC-6
# Edit names of the list
names(lst) <- paste0("df_", names(lst))
# Assign each data frame from the list to a data frame object in the global
# environment
for(name in names(lst)) {
assign(name, lst[[name]])
}
Created on 2020-02-24 by the reprex package (v0.3.0)
And, if you prefer, here is a tidyverse
-ish approach:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3",
"ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)
bar_f %>%
mutate(factor_for_split = str_extract(string = Barcode,
pattern = "[[:digit:]]")) %>%
group_split(factor_for_split)
#> [[1]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-1 1
#> 2 ABCC-1 1
#>
#> [[2]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-2 2
#> 2 ABCC-2 2
#>
#> [[3]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-3 3
#> 2 ABCC-3 3
#>
#> [[4]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-4 4
#> 2 ABCC-4 4
#>
#> [[5]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-5 5
#> 2 ABCC-5 5
#>
#> [[6]]
#> # A tibble: 2 x 2
#> Barcode factor_for_split
#> <fct> <chr>
#> 1 ABCD-6 6
#> 2 ABCC-6 6
#>
#> attr(,"ptype")
#> # A tibble: 0 x 2
#> # ... with 2 variables: Barcode <fct>, factor_for_split <chr>
names(lst) <- paste0("df_", 1:length(lst))
for(name in names(lst)) {
assign(name, lst[[name]])
Created on 2020-02-24 by the reprex package (v0.3.0)
Upvotes: 3
Reputation: 6489
Here is an another solution using built-in functions:
dfs <- split(bar_f, gsub("\\D", "", DT$Barcode))
names(dfs) <- paste0("df_", names(dfs))
for(nm in names(dfs)) assign(nm, dfs[[nm]])
Upvotes: 1
Reputation: 17648
you can try
library(tidyverse)
separate(bar_f, Barcode, into = letters[1:2], sep ="-")
and the full tidyvers
-way could look like
bar_f %>%
separate(Barcode, into = letters[1:2], sep ="-") %>%
filter(b == 1)
a b
1 ABCD 1
2 ABCC 1
in base R
you can try a gsub
which removes letters & LETTERS and -
bar_f$SampleID <- gsub("[aA-zZ|-]","",bar_f$Barcode)
head(bar_f)
Barcode SampleID
1 ABCD-1 1
2 ABCC-1 1
3 ABCD-2 2
4 ABCC-2 2
5 ABCD-3 3
6 ABCC-3 3
Upvotes: 1