Mauro
Mauro

Reputation: 479

regex: pattern fails to match what I am looking for

I have the following code that tries to retrieve the name of a file from a directory based on a double \ character:

import re

string = 'I:/Etrmtest/PZMALIo4/ETRM841_FX_Deals_Restructuring/FO_PRE\\abo_st_gas_dtd.csv'
pattern = r'(?<=*\\\\)*'
re.findall(pattern,string)

The reasoning behind is that the name of the file is always after a double \ , so I try to look any string which is preceeded by any text that finishes with \ .

Neverthless, when I apply this code I get the following error:

error: nothing to repeat at position 4

What am I doing wrong?

Edit: The concrete output I am looking for is getting the string 'abo_st_gas_dtd_csv' as a match.

Upvotes: 1

Views: 596

Answers (3)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521259

Your pattern is just a lookabehind, which, by itself, can't match anything. I would use this re.findall approach:

string = 'I:/Etrmtest/PZMALIo4/ETRM841_FX_Deals_Restructuring/FO_PRE\\abo_st_gas_dtd.csv'
filename = re.findall(r'\\([^.]+\.\w+)$', string)[0]
print(filename)  # abo_st_gas_dtd.csv

Upvotes: 2

Yoni
Yoni

Reputation: 46

files = 'I:E\\trm.csvest/PZMALIo4\ETRM841_FX_.csvDeals_Restructuring/FO_PRE\\abo_st_gas_dtd.csv'
counter = -1
my_files = []
for f in files:
    counter += 1
    if ord(f) == 92:#'\'
        temp = files[counter+1:len(files)]
        temp_file = ""
        for f1 in temp:
            temp_file += f1
            # [0-len(temp_file)] => if [char after . to num index of type file]== csv
            if f1 == '.' and temp[len(temp_file):len(temp_file)+3] == "csv":
                my_files.append(temp_file + "csv")
                break
print(my_files)#['trm.csv', 'ETRM841_FX_.csv', 'abo_st_gas_dtd.csv']


Upvotes: 1

Shkaal
Shkaal

Reputation: 76

There's a couple of things going on:

  1. You need to declare your string definition using the same r'string' notation as for the pattern; right now your string only has a single backslash, since the first one of the two is escaped.
  2. I'm not sure you're using * correctly. It means "repeat immediately preceding group", and not just "any string" (as, e.g., in the usual shell patterns). The first * in parentheses does not have anything preceding it, meaning that the regex is invalid. Hence the error you see. I think, what you want is .*, i.e., repeating any character 0 or more times. Furthermore, it is not needed in the parentheses. A more correct regexp would be r'(?<=\\\\).*':
import re

string = r'I:/Etrmtest/PZMALIo4/ETRM841_FX_Deals_Restructuring/FO_PRE\\abo_st_gas_dtd.csv'

pattern = r'(?<=\\\\).*'

re.findall(pattern,string)

Upvotes: 3

Related Questions