Reputation: 117
In a log file, there are .csv files and I need to find only one occurrence of that file name with file extension.
Example: Abcde.csv
I have this below line but it finds all the csv files with filepath within the words “Importing file ./“ and “.”. It also isn’t considering “.” full stop in the sentence and printing the next line.
for result in re.findall(‘Importing file ./(.*?).’, fp.read(), re.S):
Is there any way I can get only the filename with file extension.
Current result:
/user/path/abcde.csv
Line number :1235
Expectation:
abcde.csv
Update- Current line :
Line number 12983: Importing file /user/path/abcde.csv.
Upvotes: 2
Views: 120
Reputation: 627600
You can use
for result in re.findall(r'Importing file \./(?:.*/)?(.+)\.', fp.read()):
print(result.group(1))
See the regex demo. Details:
Importing file \./
- a Importing file ./
string(?:.*/)?
- an optional occurrence of any text ending with /
(to get to the last /
on the line)(.+)
- Group 1 (the result): one or more chars other than line break chars as many as possible.Upvotes: 1
Reputation: 163632
For this example string
Line number 12983: Importing file /user/path/abcde.csv.
You can use:
\bImporting file (?:/[^/\n]+)*/([^/\n]+\.csv)\.
\bImporting file
Match literally(?:
Non capture group
/[^/\n]+
Match /
and 1 or more chars other than /
or a newline)*
Close the non capture group and optionally repeat/
Match a /
(
Capture group 1
[^/\n]+\.csv
Match 1+ chars other than /
or a newline and then .csv
)\.
Close group 1 and match the trailing dotExample
for result in re.findall(r"\bImporting file (?:/[^/\n]+)*/([^/\n]+\.csv)\.", fp.read()):
print(result)
Output
abcde.csv
Upvotes: 1
Reputation: 586
I would recommend using re.search
to find the path, like so
filename = re.search('Importing file (/.+)*/(.*.csv)', fp.read()).group(2)
Here, there are multiple things happening:
re.search
searches a string for a certain regex.+
matches one or more characters (any characters).*
matches any number of characters (could be none)(/.+)*
matches any form of '/aaa/bbb/ccc/etc', by matching a slash followed by characters any number of times(.*.csv)
matches a csv file name such as 'anyfilename.csv'group(2)
means it finds only the text in the second set of parentheses in the matched string, in this case .*.csv
, the filenameAlso note I took out the re.S
flag so the filename can't contain newlines
Upvotes: 1