user15051990
user15051990

Reputation: 1895

Split using regex in python

I wan to split file name from the given file using regex function re.split.Please find the details below:

SVC_DC = 'JHN097567898_01102019_050514_svc_dc.tar"

My solution:

import regex as re
ans=re.split(os.sep,SVC_DC)

Error: re.error: bad escape (end of pattern) at position 0

Thanks in advance

Upvotes: 0

Views: 1652

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30971

The reason of your failure are details concerning regular expressions, namely the quotation issue.

E.g. under Windows os.sep = '\\', i.e. a single backslash.

But the backslash in regex has special meaning, just to escape special characters, so in order to use it literally, you have to write it twice.

Try the following code:

import re
import os

SVC_DC = 'JHN097567898_01102019_050514_svc_dc.tar'
print(re.split(os.sep * 2, SVC_DC))

The result is:

['JHN097567898_01102019_050514_svc_dc.tar']

As the source string does not contain any backslashes, the result is a list containing only one item (the whole source string).

Edit

To make the regex working under both Windows and Unix, you can try:

print(re.split('\\' + os.sep, SVC_DC))

Note that this regex contains:

  • a hard-coded backslash as the escape character,
  • the path separator used in the current operating system.

Note that the forward slash (in Unix case) does not require quotation, but using quotation here is still acceptable (not needed, but working).

Upvotes: 1

jsbueno
jsbueno

Reputation: 110271

If you want a filename, regexes are not your answer.

Python has the pathlib module dedicated to handling filepaths, and its objects, besides having methods to get the isolated filename handlign all possible corner-cases, also have methods to open, list files, and do everything one normally does to a file.

To get the base filename from a path, just use its automatic properties:

In [1]: import pathlib

In [2]: name = pathlib.Path("/home/user/JHN097567898_01102019_050514_svc_dc.tar")

In [3]: name.name
Out[3]: 'JHN097567898_01102019_050514_svc_dc.tar'

In [4]: name.parent
Out[4]: PosixPath('/home/user')

Otherwise, even if you would not use pathlib, os.path.sep being a single character, there would be no advantage in using re.split at all - normal string.split would do. Actually, there is os.path.split as well, that, predating pathlib, would always do the samething:

In [6]: name = "/home/user/JHN097567898_01102019_050514_svc_dc.tar"

In [7]: import os

In [8]: os.path.split(name)[-1]
Out[8]: 'JHN097567898_01102019_050514_svc_dc.tar'

And last (and in this case, actually least), the reason of the error is that you are on windows, and your os.path.sep character is "\" - this character alone is not a full regular expression, as the regex engine expects a character indicating a special sequence to come after the "\". For it to be used withour error, you'd need to do:

 re.split(re.escape(os.path.sep), "myfilepath")

Upvotes: 2

Related Questions