Reputation: 607
I am having a bit of trouble putting this logic on paper:
The string I would like to parse: "Jan - 2012 Presentation v1.3.ppt.pdf - 500KB"
This string can vary but the structure is always "NAME+EXT+FILESIZE"
I want to return the extension. However for obvious reasons I cannot just split(".")
So I came up with something else:
stringy = "Jan - 2012 Presentation v1.3.ppt.pdf - 500KB"
ext = [".pdf",".jpg",".ppt",".txt",".doc"]
for i in ext:
indx = stringy.find(i)
...
I got stuck where I need to figure out how to tell Python to take the extension starting with the biggest index yielded. Should be something like whatiwant = stringy[indx:4]
, but I can't figure out how to tell it to only take the largest index... The largest index will obviously mean the last extension in the string, which is the one I want to get. In this particular example, I don't care about "ppt", but rather the "pdf".
Can this perhaps be done in a more pythonic way? Or at least more efficiently?
Upvotes: 1
Views: 983
Reputation: 250891
using regex
:
>>> strs="Jan - 2012 Presentation v1.3.ppt.pdf - 500KB"
>>> re.findall(r"(\.\w+)",strs)[-1]
'.pdf'
or:
>>> re.findall(r".*(\.\w+)",strs)
['.pdf']
Upvotes: 1
Reputation: 14854
In [44]: stringy[stringy.rfind('.'):stringy.rfind('.')+4]
Out[44]: '.pdf'
Upvotes: 2
Reputation: 2340
Try this:
>>> stringy = "Jan - 2012 Presentation v1.3.ppt.pdf - 500KB"
>>> extension = stringy.split(".")[-1].split("-")[0].strip()
>>> extension
'pdf'
Upvotes: 0