user10745594
user10745594

Reputation:

How to count the number of underscores and split the string on the middle one only?

I would like to count the number of underscores and split the string into two different strings at the middle underscore.

strings <- c('aa_bb_cc_dd_ee_ff', 'cc_hh_ff_zz", "bb_dd")

Desired Output:

First        Last
"aa_bb_cc"   "dd_ee_ff"
"cc_hh"      "ff_zz"
"bb"         "dd"

Upvotes: 1

Views: 1473

Answers (3)

IceCreamToucan
IceCreamToucan

Reputation: 28705

Adapting nhahtdh's answer here, all you need to do is add a step to count the underscores (done here with str_count) and return the median number of underscores.

library(stringr)

strsplit(
  strings, 
  paste0("^[^_]*(?:_[^_]*){", str_count(strings, '_') %/% 2, "}\\K_"), 
  perl = TRUE)

# [[1]]
# [1] "aa_bb_cc" "dd_ee_ff"
# 
# [[2]]
# [1] "cc_hh" "ff_zz"
# 
# [[3]]
# [1] "bb" "dd"

Upvotes: 2

Bill O&#39;Brien
Bill O&#39;Brien

Reputation: 882

This assumes an odd number of underscores, and 99 or fewer.

library(stringr)
library(strex)
strings <- c('aa_bb_cc_dd_ee_ff', 'cc_hh_ff_zz', 'bb_dd')

splitMiddleUnderscore <- function(x){
    nUnderscore <- str_count(x, '_')
    middleUnderscore <- match(nUnderscore, seq(1, 99, 2))
    str1 <- str_before_nth(x, '_',  middleUnderscore)
    str2 <- str_after_nth(x, '_', middleUnderscore)
    c(str1, str2)
}

lapply(strings, splitMiddleUnderscore)

#[[1]]
#[1] "aa_bb_cc" "dd_ee_ff"

#[[2]]
#[1] "cc_hh" "ff_zz"

#[[3]]
#[1] "bb" "dd"

Upvotes: 1

Dan
Dan

Reputation: 12084

Here's a cludgy solution that assumes that there are always an odd number of underscores.

# Load libraries
library(stringr)

# Define function
even_split <- function(s){
  # Split string
  tmp <- str_split(s, "_")

  lapply(tmp, function(x){
    # Patch string back together in two pieces
    c(paste(x[1:(length(x)/2)], collapse = "_"),
      paste(x[(1+length(x)/2):length(x)], collapse = "_"))
  })
}

# Example
strings <- c('aa_bb_cc_dd_ee_ff', 'cc_hh_ff_zz', 'bb_dd')

# Test function
even_split(strings)
#> [[1]]
#> [1] "aa_bb_cc" "dd_ee_ff"
#> 
#> [[2]]
#> [1] "cc_hh" "ff_zz"
#> 
#> [[3]]
#> [1] "bb" "dd"

Created on 2019-01-18 by the reprex package (v0.2.1)

Upvotes: 3

Related Questions