Abigail575
Abigail575

Reputation: 175

Splitting a data frame based on character string

I would like to split the following data frame based on the final numbers of each element. So I would like 6 new data frames each with two elements. Here is my attempt at obtaining a data frame of the first subset containing just "ABCD-1" and "ABCC-1", but it doesn't seem to be working.

library("reshape2")
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3", 
"ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)
bar_f

bar_f$SampleID <- colsplit(bar_f$Barcode, pattern = "-", names = c("a","b"))$b
bar_f.s1 <- subset(barcode_file, barcode_file$SampleID == "1")
bar_f.s1

Can you help?

Thank you,

Abigail

Upvotes: 1

Views: 725

Answers (3)

Valentin_Ștefan
Valentin_Ștefan

Reputation: 6426

The main idea is to create a factor used to define the grouping for splitting. One way is by extracting the digits pattern form the provided variable Barcode using regular expression. Then we convert the obtained character vector of digits to a factor with as.factor(). We can, of course, use other regular expression techniques to get the job done, or more user friendly wrapper functions from the stringr package, like in the second example (the tidyverse-ish approach).

Example 1

A base R solution using split:

# The provided data
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3", 
             "ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)

factor_for_split <- regmatches(x = bar_f$Barcode,
                               m = regexpr(pattern = "[[:digit:]]",
                                           text = bar_f$Barcode))
factor_for_split
#>  [1] "1" "1" "2" "2" "3" "3" "4" "4" "5" "5" "6" "6"

# Create a list of 6 data frames as asked
lst <- split(x = bar_f, f = as.factor(factor_for_split))
lst
#> $`1`
#>   Barcode
#> 1  ABCD-1
#> 2  ABCC-1
#> 
#> $`2`
#>   Barcode
#> 3  ABCD-2
#> 4  ABCC-2
#> 
#> $`3`
#>   Barcode
#> 5  ABCD-3
#> 6  ABCC-3
#> 
#> $`4`
#>   Barcode
#> 7  ABCD-4
#> 8  ABCC-4
#> 
#> $`5`
#>    Barcode
#> 9   ABCD-5
#> 10  ABCC-5
#> 
#> $`6`
#>    Barcode
#> 11  ABCD-6
#> 12  ABCC-6

# Edit names of the list
names(lst) <- paste0("df_", names(lst))

# Assign each data frame from the list to a data frame object in the global
# environment
for(name in names(lst)) {
  assign(name, lst[[name]])
}

Created on 2020-02-24 by the reprex package (v0.3.0)

Example 2

And, if you prefer, here is a tidyverse-ish approach:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3", 
             "ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)

bar_f %>% 
  mutate(factor_for_split = str_extract(string = Barcode,
                                        pattern = "[[:digit:]]")) %>% 
  group_split(factor_for_split)
#> [[1]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-1  1               
#> 2 ABCC-1  1               
#> 
#> [[2]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-2  2               
#> 2 ABCC-2  2               
#> 
#> [[3]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-3  3               
#> 2 ABCC-3  3               
#> 
#> [[4]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-4  4               
#> 2 ABCC-4  4               
#> 
#> [[5]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-5  5               
#> 2 ABCC-5  5               
#> 
#> [[6]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-6  6               
#> 2 ABCC-6  6               
#> 
#> attr(,"ptype")
#> # A tibble: 0 x 2
#> # ... with 2 variables: Barcode <fct>, factor_for_split <chr>

names(lst) <- paste0("df_", 1:length(lst))
for(name in names(lst)) {
  assign(name, lst[[name]])

Created on 2020-02-24 by the reprex package (v0.3.0)

Upvotes: 3

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6489

Here is an another solution using built-in functions:

dfs <- split(bar_f, gsub("\\D", "", DT$Barcode))
names(dfs) <- paste0("df_", names(dfs))

for(nm in names(dfs)) assign(nm, dfs[[nm]])

Upvotes: 1

Roman
Roman

Reputation: 17648

you can try

library(tidyverse)
separate(bar_f, Barcode, into = letters[1:2], sep ="-")

and the full tidyvers-way could look like

bar_f %>% 
  separate(Barcode, into = letters[1:2], sep ="-") %>% 
  filter(b == 1)
     a b
1 ABCD 1
2 ABCC 1

in base R you can try a gsub which removes letters & LETTERS and -

bar_f$SampleID <- gsub("[aA-zZ|-]","",bar_f$Barcode)
head(bar_f)
  Barcode SampleID
1  ABCD-1        1
2  ABCC-1        1
3  ABCD-2        2
4  ABCC-2        2
5  ABCD-3        3
6  ABCC-3        3

Upvotes: 1

Related Questions