rashxxx
rashxxx

Reputation: 117

Finding the first unique occurrence in file

In a log file, there are .csv files and I need to find only one occurrence of that file name with file extension.

Example: Abcde.csv

I have this below line but it finds all the csv files with filepath within the words “Importing file ./“ and “.”. It also isn’t considering “.” full stop in the sentence and printing the next line.

for result in re.findall(‘Importing file ./(.*?).’, fp.read(), re.S):

Is there any way I can get only the filename with file extension.

Current result:

/user/path/abcde.csv
Line number :1235

Expectation:

abcde.csv

Update- Current line :

Line number 12983: Importing file /user/path/abcde.csv.

Upvotes: 2

Views: 120

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627600

You can use

for result in re.findall(r'Importing file \./(?:.*/)?(.+)\.', fp.read()):
    print(result.group(1))

See the regex demo. Details:

  • Importing file \./ - a Importing file ./ string
  • (?:.*/)? - an optional occurrence of any text ending with / (to get to the last / on the line)
  • (.+) - Group 1 (the result): one or more chars other than line break chars as many as possible.

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163632

For this example string

Line number 12983: Importing file /user/path/abcde.csv.

You can use:

\bImporting file (?:/[^/\n]+)*/([^/\n]+\.csv)\.
  • \bImporting file Match literally
  • (?: Non capture group
    • /[^/\n]+ Match / and 1 or more chars other than / or a newline
  • )* Close the non capture group and optionally repeat
  • / Match a /
  • ( Capture group 1
    • [^/\n]+\.csv Match 1+ chars other than / or a newline and then .csv
  • )\. Close group 1 and match the trailing dot

Regex demo

Example

for result in re.findall(r"\bImporting file (?:/[^/\n]+)*/([^/\n]+\.csv)\.", fp.read()):
    print(result)

Output

abcde.csv

Upvotes: 1

Scrapper142
Scrapper142

Reputation: 586

I would recommend using re.search to find the path, like so

filename = re.search('Importing file (/.+)*/(.*.csv)', fp.read()).group(2)

Here, there are multiple things happening:

  • re.search searches a string for a certain regex
  • .+ matches one or more characters (any characters)
  • .* matches any number of characters (could be none)
  • (/.+)* matches any form of '/aaa/bbb/ccc/etc', by matching a slash followed by characters any number of times
  • (.*.csv) matches a csv file name such as 'anyfilename.csv'
  • group(2) means it finds only the text in the second set of parentheses in the matched string, in this case .*.csv, the filename

Also note I took out the re.S flag so the filename can't contain newlines

Upvotes: 1

Related Questions