Reputation: 980
I have a awfully formatted data file with data structure as below:
" id1 id2 id3 id4"
" id1 id2 id3 id4"
" id1 id2 id3 id4"
I should retrieve id2 and id4 in each line, but number of spaces in each line between ids differs. Is there a way i could replace all consecutive spaces in each line with some character like '/t' so i could retrieve second and fourth item in each line?! I appreciate any help.
Upvotes: 1
Views: 1162
Reputation: 4211
This is not the most elegant way to do it, but easy to understand. This function replaces consecutive spaces with a single space.
def remove_extra_spaces(s):
s_res = ""
flip = False
for c in s:
if c == ' ':
# first one is ok, next ones not
if not flip:
s_res += c
flip = True
else:
flip = False
s_res += c
return s_res
Upvotes: 0
Reputation: 19733
using re.sub
>>> import re
>>> s = " id1 id2 id3 id4"
>>> re.sub('\s+',' ',s.strip())
'id1 id2 id3 id4'
you can use split and slicing:
>>> s = " id1 id2 id3 id4"
>>> s.split()[1::2]
['id2', 'id4']
using re.findall:
>>> s = " id1 id2 id3 id4"
>>> re.findall('id[24]',s)
['id2', 'id4']
Upvotes: 3
Reputation: 180391
You just need to split to get elements:
s = " id1 id2 id3 id4"
frst,sec,th,frth = s.split()
print(sec,frth)
id2 id4
Upvotes: 1
Reputation: 5061
>>> s = " id1 id2 id3 id4"
>>> s.split()
['id1', 'id2', 'id3', 'id4']
>>> '\t'.join(s.split())
'id1\tid2\tid3\tid4'
>>> print '\t'.join(s.split())
id1 id2 id3 id4
To extract id2
and id4
use indexing with str.split
>>> a, b = s.split()[1], s.split()[3]
>>> a, b
('id2', 'id4')
Upvotes: 1
Reputation: 336108
The simplest way would be to do a .split()
which automatically splits on any number of whitespace characters and ignores leading and trailing whitespace:
>>> s = " id1 id2 id3 id4"
>>> items = s.split()
>>> items
['id1', 'id2', 'id3', 'id4']
That way, you can access items[1]
and items[3]
directly. If you want to rebuild them into a tab-separated string, use .join()
:
>>> "\t".join(items)
'id1\tid2\tid3\tid4'
Upvotes: 10