orestisf
orestisf

Reputation: 353

Matching string between nth occurrence of character in python with RegEx

I'm working with files in a tar.gz file which contains txt files and trying to extract the filename of a the related TarInfo object whose member.name property looks like this:

aclImdb/test/neg/1026_2.txt
aclImdb/test/neg/1027_5.txt
...
aclImdb/test/neg/1030_4.txt

I've written the following code which prints the string test/neg/1268_2

regex = '\/((?:[^/]*/).*?)\.'
with tarfile.open("C:\\Users\\Orestis\\Desktop\\aclImdb_v1.tar.gz") as archive:
    for member in archive.getmembers():
         if member.isreg():
         m = re.findall(regex, member.name)
         print(m)

How should I modify the regex to extract only the 1268_2 part of the filenames? Effectively I want to extract the string after the 3rd occurrence of "/" and before the 1st occurrence of ".".

Upvotes: 0

Views: 1198

Answers (1)

jaspersnel
jaspersnel

Reputation: 36

You could hardcode this:

.*?\/.*?\/.*?\/(.*?)\.

More elegant is something along the lines of this:

(.*?\/){3}(.*?)\.

You can simply change the 3 to suit your pattern. (Note that the group you'll want is $2)

Upvotes: 2

Related Questions