Reputation: 2870
I'm trying to extract the string that are between the quotation mark "
and .pdf
. For example, "../matlab/license_admin.pdf" abc "vfv"
-> ../matlab/license_admin.pdf
and "license_admin.pdf" xyz'
-> license_admin.pdf
. I try the following code:
import re
base = '"../matlab/license_admin.pdf" abc "vfv"'
base1 = '"license_admin.pdf" xyz'
result = re.findall(r'\b(\S+\.pdf)\b', base)
result1 = re.findall(r'\b(\S+\.pdf)\b', base1)
print(result)
print(result1)
but it only works with the my second example. The code remove ../
in my first one:
Could you please help me modify the regular expression \b(\S+\.pdf)\b
to achieve my goal? Thank you so much!
Upvotes: 1
Views: 41
Reputation: 626926
Use
import re
bases = ['"../matlab/license_admin.pdf" abc "vfv"', '"license_admin.pdf" xyz']
for base in bases:
m = re.search(r'"(.*?\.pdf)', base)
if m:
print(m.group(1))
See the Python demo
Output:
../matlab/license_admin.pdf
license_admin.pdf
The "(.*?\.pdf)
pattern matches "
, then captures into Group 1 any 0 or more chars but line break chars, as few as possible, and then .pdf
. With re.search
, you get the first match, and m.group(1)
acccesses the Group 1 value.
See the regex demo.
Upvotes: 1