Reputation:
I've got a file formatted like this:
3 name1
2 name2
1 name3
The space between the number and the name can be one or several spaces, or any number of tabs.
I'm trying to find a way to match this line with a regex and extract the number and the name in a list or tuple.
I could write this in several lines, but I'd rather have one clean line that can both recognize tabs and whitespace and give me my values. I've been unsuccessful in doing that.
edit: I've tried using re.search('^[\d]+[\s|\t]+.*', line)
to match any number of digits, either spaces or tabs and then anything. But this doesn't work - presumably because I'm not telling it what to extract for me.
Upvotes: 1
Views: 344
Reputation: 180411
You don't need a regex at all, you can str.split
it does not matter if you have 1 or 21 spaces between:
lines="""3 name1
2 name2
1 name3"""
for line in lines.splitlines():
num, name = line.split()
print(num,name)
3 name1
2 name2
1 name3
In a list comp:
print([line.split() for line in lines.splitlines()])
[['3', 'name1'], ['2', 'name2'], ['1', 'name3']]
replace the lines.splitlines()
with your file object in your own code.
Using a regex to split on whitespace is not a very good approach:
In [13]: timeit re.search('^(\d+)\s+(.*)', line).groups()
1000000 loops, best of 3: 2.04 µs per loop
In [14]: timeit line.split()
1000000 loops, best of 3: 222 ns per loop
Out[15]: ('1', 'abc')
In [16]: line.split()
Out[16]: ['1', 'abc']
split does the exact same thing in just over a tenth of the time.
Even if there are more than two values you can split and extract the first two:
lines="""3 name1 foo
2 name2 bar
1 name3 foobar """
print( [line.split(None, 2)[:2] for line in lines.splitlines()])
[['3', 'name1'], ['2', 'name2'], ['1', 'name3']]
Upvotes: 3
Reputation: 113844
All you need to do is add parens around what you want to capture:
>>> line='1\t abc'
>>> re.search('^(\d+)\s+(.*)', line).groups()
('1', 'abc')
Incidentally, notice that the regex that you used starts with a ^
which matches only at the beginning of a line. Consequently, match
can be used in place of search
here:
>>> re.match('(\d+)\s+(.*)', line).groups()
('1', 'abc')
Upvotes: 5