Reputation: 2970
I have a question about regex/Python. Sorry if this topic has been discussed millions of times - usually I find the answers on so/google etc. but I'm stuck in the millions of answers with this one.. (To be honest - I own a regex book, but somehow I'm too stupid to really understand it...)
For a music-management-system I need to extract information out of paths, providing different sets of options. Here two examples:
"/The Prodigy/The Fat Of The Land/04 - Funky Stuff.flac"
it should extract:
"/[XLR 483] The Fat Of The Land/04 - The Prodigy - The Funky Stuff.flac"
should extract:
There is no need for a regex that covers both cases, these are just two examples. I'll then provide them as options (or starting-point to add own ones).
Any help would be greatly appreciated!
@ S.Lott: I don't have a regex for this, I started with splitting the string:
parts = rel_path.split('/')
track = parts[-1]
release = parts[-2]
artist = parts[-3]
but this looks like an extremely inflexible and un-elegant solution to me.
So far I have something like:
pattern = re.compile('^/(?P<artist>[a-zA-Z0-9 ]+)/(?P<release>[a-zA-Z0-9 ]+)/(?P<track>[a-zA-Z0-9 -_]+).[a-zA-Z]*.*')
rel_path = '/The Prodigy/The Fat Of The Land/04 - Funky Stuff.flac'
match = pattern.search(rel_path)
artist = match.group('artist')
release = match.group('release')
track = match.group('track')
Upvotes: 3
Views: 20163
Reputation: 20196
pattern1 = re.compile(r'/([^/]*)/([^/]*)/([0-9]*) - (.*)\.[^.]*')
artist,release,Tracknumber,Title = pattern1.match(file1).groups()
pattern2 = re.compile(r'/\[([^]]*)\] ([^/]*)/([0-9]*) - (.*) - (.*)\.[^.]*')
catno,release,Tracknumber,artist,Title = pattern2.match(file2).groups()
(where file1
and file2
are the paths you gave above).
First thing: you capture something matched by a regex with parentheses. So everything between parentheses below will be spit back out as an item in the match.
Second: you match anything except a forward slash with regex code like [^/]
. So to match lots of things between forward slashes, you do [^/]*
.
Putting those together, to capture the artist in your first sttring, you do /([^/]*)/
. Then you do that again to get the release.
Third: to match any digit, you use [0-9]
. So, to match any string of digits, you use [0-9]*
.
Apply those principles repeatedly, and you should be able to understand the above.
Upvotes: 3
Reputation: 199
Although not necessary, but re is handy choice for this problem.
import re
pattern = re.compile(r"/(?P<artist>[a-zA-Z0-9 ]+?)/(?P<release>[a-zA-Z0-9 ]+?)/(?P<tracknumber>\d+?) - (?P<title>[a-zA-Z0-9 ]+?).flac")
s = "/The Prodigy/The Fat Of The Land/04 - Funky Stuff.flac"
m = pattern.search(s)
print m.group('artist')
print m.group('release')
print m.group('track number')
print m.group('title')
I use expressions such as [a-zA-Z0-9 ]
to explicitly specify the chars I expect in the string. It is just my preference to have a white-list-like regex to make the code more secure. There are many other ways to compose equivalent patterns. You will find all you need here http://docs.python.org/library/re.html, you don't need a book for that.
Upvotes: 6
Reputation: 11220
You should fist use split
with the /
delimiter so that you will be able to have informations just with the size of the array returned by split
.
Then you can use regexp if you need. For instance, in the second case: (which happens only if you have two elements right?)
import re
item = "/[XLR 483] The Fat Of The Land/04 - The Prodigy - The Funky Stuff.flac"
matches = re.search('^\/?\[([^\]]+)](.*)\/', item)
print matches.group(1) # 'XLR 483'
print matches.group(2) # ' The Fat Of The Land'
It may seems a bit complicated, but I have escaped all ambiguous characters so basically, the pattern is the following:
^
at the beginning/?
there can be at most one slash /
followed by...[
a curly brace([^\]]+)
containing all but a closing curly brace one or more times +
(and please, capture the values, using the grouping parenthesis) and]
a closing curly brace followed by (.*)
anything but a linefeed (0 or more times *
) captured via the parenthesis/
.hope this helps!
Upvotes: 0
Reputation: 56841
Here is my approach to the problem that you are having.
If you have any specific doubts, in writing regex, edit your question and follow S.Lott's suggestion.
Upvotes: 0