quant
quant

Reputation: 4492

How to get substring between two numbers (of unknown digit length) in python

I have a string that looks like this a = 'readyM01JUN_01_18_0144.xlsx' and I would like to tease out the JUN.

I thought first trying to split a with respect to digits, but a.split('[0-9]+') doesn't work. Any ideas ?

Upvotes: 1

Views: 672

Answers (3)

RoadRunner
RoadRunner

Reputation: 26325

You could also try an iterative approach like this:

import re

def remove_string(string, sub):
    res = string
    reduce = 0
    for loc in re.finditer(sub, string):
        res = res[:loc.start()+reduce] + res[loc.start()+len(sub)+reduce:]
        reduce -= len(sub)

    return res

Which Outputs:

>>> remove_string('readyM01JUN_01_18_0144.xlsx', 'JUN')
readyM01_01_18_0144.xlsx
>>> remove_string('readyM01JUN_01_18_0144JUN.xlsx', 'JUN')
readyM01_01_18_0144.xlsx

Upvotes: 0

Simeon Ikudabo
Simeon Ikudabo

Reputation: 2190

Not sure what your program objective is, but if JUN stands for June, and you have a series of months and your data and want to remove them all, I would create a list of months, iterate through them, and then replace them in the particular string you are working on. You can get JUN out of the string by using the .remove() variable on a and then placing it as the value of a new variable a, since strings are immutable. Here is an example:

months = ['JAN', 'FEB', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEPT', 'OCT', 'NOV', 'DEC']
a = 'readyM01JUN_01_18_0144.xlsx'

for month in months:
   if month in a:
      a = a.replace(month, '')
      print(a)

OUTPUT:

readyM01_01_18_0144.xlsx

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627128

Since a is a string, split in your case only accepts a literal string to split, not a regex. To split with a regex pattern, you need re.split.

However, you may use

import re
a = 'readyM01JUN_01_18_0144.xlsx'
m = re.search(r'\d([^_\d]+)_\d', a) # Or, r'\d([a-zA-Z]+)_\d'
if m:
    print(m.group(1))

See the Python demo

Pattern details

  • \d - a digit
  • ([^_\d]+) - Group 1 matching and capturing (m.group(1) will hold this value) 1+ chars other than digits and _ (you may even use ([a-zA-Z]+) to match 1+ ASCII letters)
  • _\d - a _ and a digit.

See the regex demo.

Note that re.search returns the first leftmost match.

Upvotes: 2

Related Questions