user11916948
user11916948

Reputation: 954

Change name of certain character and location in filenames

I want to change one of the _ to another character, for example to -, the reason is there are problems reading in these filenames. I want a to become like b. So I want to change the second last underscore(_), how to specify this in an efficient way?

gsub("_", "-"), it must also be specified to a certain location.

a <- c("2018-01-09_B2_HILIC_POS_123_-14b_090.mzML", "2018-01-09_B2_HILIC_POS_243_-12a_026.mzML", "2020-01-09_B2_HILIC_POS_415_893a_059.mzML", "2020-01-18_B3_HILIC_POS_LV7001248356_040.mzML")
b <- c("2018-01-09_B2_HILIC_POS_123--14b_090.mzML", "2018-01-09_B2_HILIC_POS_243--12a_026.mzML", "2020-01-09_B2_HILIC_POS_415-893a_059.mzML", "2020-01-18_B3_HILIC_POS_LV4004365711_040.mzML")

Upvotes: 0

Views: 44

Answers (2)

Ryszard Czech
Ryszard Czech

Reputation: 18621

Use

sub("_(?=[^_]*_[^_]*$)", "-", a, perl=TRUE)

See regex proof.

Explanation

--------------------------------------------------------------------------------
  _                        '_'
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^_]*                    any character except: '_' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    _                        '_'
--------------------------------------------------------------------------------
    [^_]*                    any character except: '_' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

See R proof:

a <- c("2018-01-09_B2_HILIC_POS_123_-14b_090.mzML", "2018-01-09_B2_HILIC_POS_243_-12a_026.mzML", "2020-01-09_B2_HILIC_POS_415_893a_059.mzML", "2020-01-18_B3_HILIC_POS_LV7001248356_040.mzML")
sub("_(?=[^_]*_[^_]*$)", "-", a, perl=TRUE)

Results:

[1] "2018-01-09_B2_HILIC_POS_123--14b_090.mzML"    
[2] "2018-01-09_B2_HILIC_POS_243--12a_026.mzML"    
[3] "2020-01-09_B2_HILIC_POS_415-893a_059.mzML"    
[4] "2020-01-18_B3_HILIC_POS-LV7001248356_040.mzML"

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389135

Here is one base R option using sub :

sub('(.*)(_)(.*_.*)$', '\\1-\\3', a)
#[1] "2018-01-09_B2_HILIC_POS_123--14b_090.mzML"    
#[2] "2018-01-09_B2_HILIC_POS_243--12a_026.mzML"    
#[3] "2020-01-09_B2_HILIC_POS_415-893a_059.mzML"    
#[4] "2020-01-18_B3_HILIC_POS-LV7001248356_040.mzML"

Here we divide data into 3 groups -

The 1st group is everything until second last underscore which is captured using (.*) and used as a backreference (\\1).

The 2nd group is second last underscore which us replaced with -.

The 3rd one is everything after second last underscore which is captured using (.*_.*) and used as a backreference (\\3).

Upvotes: 2

Related Questions