Reputation: 5769
My following data:
'DOMA A\r\nName: Ryan\r\nBest: 1\r\nAlias: 3K\r\nLocation: Eng\r\nGame Wins: 51\r\nTime: 09:10:50'
Has some problems when using regex patterns to find everything...
pattern1 = re.compile('DOMA: (.*)\r\n')
pattern2 = re.compile('Name: (.*)\r\n')
pattern3 = re.compile('Best: (.*)\r\n')
pattern4 = re.compile('Location: (.*)\r\n')
pattern5 = re.compile('Game Wins: (.*)\r\n')
pattern6 = re.compile('Time: (.*)')
All of the above work however sometimes my data looks like:
'DOMA A\r\nName: Ryan\r\nBest: 1\r\nAlias: 3K\r\nLocation: Eng\r\nGame Wins: 51\r\nTime: 09:10:50\r\nREF: Yes'
Pattern6, returns incorrectly because it doesn't have /r/n... how can I get around this so that it only returns what's on it's current line...~
Is pattern 6 supposed to be like:
pattern6 = re.compile(r'Time: (.*)')
or
pattern6 = re.compile('Time: (.*?)')
or
pattern6 = re.compile(r'Time: (.*?)')
Thanks in advance - Hyflex
Upvotes: 0
Views: 99
Reputation: 4069
This the the sort of problem that re.MULTILINE (re.M for short) was made for. Compile the pattern as:
pattern6 = re.compile(r"Time: .*$", flags=re.M)
You can make that more specific by using r"^Time: .*$", requiring "Time: " to start a line, or add some leading space tolerance with r"^\s*Time: .*$".
Maybe this is paranoid, but the first thing I'd do before searching is filter out the \r\n newlines. I don't have to do this on Windows Python 2.7, but I don't see a guarantee in the docs that all environments will treat \r\n and \n equivalently. The easy way to do that is re.sub("\r\n", "\n", s)
to replace every "\r\n" in s with a "\n". [Note: The easier way is to use s.replace(), but as I said in the comments, this works.]
s1 = 'DOMA A\r\nName: Ryan\r\nBest: 1\r\nAlias: 3K\r\nLocation: Eng\r\nGame Wins: 51\r\nTime: 09:10:50'
s2 = 'DOMA A\r\nName: Ryan\r\nBest: 1\r\nAlias: 3K\r\nLocation: Eng\r\nGame Wins: 51\r\nTime: 09:10:50\r\nREF: Yes'
print "s1: ", pattern6.findall( re.sub('\r\n', '\n', s1) )
print "s2: ", pattern6.findall( re.sub('\r\n', '\n', s2) )
Output:
s1: ['Time: 09:10:50']
s2: ['Time: 09:10:50']
Another advantage here is that ^ and $ don't capture anything, so you don't end up with the \r\n being part of the match, and you don't need to add parentheses to make that happen.
Upvotes: 1
Reputation: 142116
Make the delimiter \r\n
or $
(which means "end of string" in a regex) - also - instead of multiple patterns, just use one generic pattern, and put it in a dictionary, then extract the named parts after:
s = 'DOMA A\r\nName: Ryan\r\nBest: 1\r\nAlias: 3K\r\nLocation: Eng\r\nGame Wins: 51\r\nTime: 09:10:50'
import re
res = dict(re.findall(r'(.*?): (.*?)(?:\r\n|$)', s))
# {'Name': 'Ryan', 'Alias': '3K', 'Location': 'Eng', 'Time': '09:10:50', 'Game Wins': '51', 'Best': '1'}
Upvotes: 3