Masih
Masih

Reputation: 980

How to replace consecutive spaces in a string in python

I have a awfully formatted data file with data structure as below:

" id1    id2             id3         id4"
"  id1    id2            id3         id4"
" id1     id2             id3         id4"

I should retrieve id2 and id4 in each line, but number of spaces in each line between ids differs. Is there a way i could replace all consecutive spaces in each line with some character like '/t' so i could retrieve second and fourth item in each line?! I appreciate any help.

Upvotes: 1

Views: 1162

Answers (6)

Fabian
Fabian

Reputation: 4211

This is not the most elegant way to do it, but easy to understand. This function replaces consecutive spaces with a single space.

def remove_extra_spaces(s):
    s_res = ""
    flip = False
    for c in s:
        if c == ' ':
            # first one is ok, next ones not
            if not flip:
                s_res += c
            flip = True
        else:
            flip = False
            s_res += c

    return s_res

Upvotes: 0

Hackaholic
Hackaholic

Reputation: 19733

using re.sub

>>> import re
>>> s = " id1    id2             id3         id4"
>>> re.sub('\s+',' ',s.strip())
'id1 id2 id3 id4'

you can use split and slicing:

>>> s = " id1    id2             id3         id4"
>>> s.split()[1::2]
['id2', 'id4']

using re.findall:

>>> s = " id1    id2             id3         id4"
>>> re.findall('id[24]',s)
['id2', 'id4']

Upvotes: 3

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

You just need to split to get elements:

s = " id1    id2             id3         id4"
frst,sec,th,frth = s.split()
print(sec,frth)
id2 id4

Upvotes: 1

Vishnu Upadhyay
Vishnu Upadhyay

Reputation: 5061

>>> s = " id1    id2             id3         id4"
>>> s.split()
['id1', 'id2', 'id3', 'id4']
>>> '\t'.join(s.split())
'id1\tid2\tid3\tid4'
>>> print '\t'.join(s.split())
id1     id2     id3     id4

To extract id2 and id4 use indexing with str.split

>>> a, b = s.split()[1], s.split()[3]
>>> a, b
('id2', 'id4')

Upvotes: 1

Andrey
Andrey

Reputation: 60065

import re
re.sub(' +', ' ', string)

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

The simplest way would be to do a .split() which automatically splits on any number of whitespace characters and ignores leading and trailing whitespace:

>>> s = " id1    id2             id3         id4"
>>> items = s.split()
>>> items
['id1', 'id2', 'id3', 'id4']

That way, you can access items[1] and items[3] directly. If you want to rebuild them into a tab-separated string, use .join():

>>> "\t".join(items)
'id1\tid2\tid3\tid4'

Upvotes: 10

Related Questions