user27976
user27976

Reputation: 903

Replacing characters in lines of strings

I would like to replace some characters in lines of strings. Three are thousands of these lines in a data frame.

Example of string:

(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717

My code that didn't work:

for line in dat:
    line.strip().split("\t")
    line = sub(r'((\.+))',\2, line)
    print line

The output that I want:

1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717

Upvotes: 0

Views: 70

Answers (4)

ettanany
ettanany

Reputation: 19806

A simple approach can be using split() and strip() functions.

We split our string to get a list of words, then we use strip to remove '(' and ')' at the beginning / end of each word, join() is then applied to the result to get the desired string.

A generator expression is used to loop over our list of words:

s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'
res = ' '.join(item.strip('()') for item in s.split(' '))
print(res)  # Output: 1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717

Upvotes: 2

Kasravnd
Kasravnd

Reputation: 107297

Since you just want to remove the parenthesises I suggest to use two replace() instead of regex():

In [9]: s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'

In [10]: s.replace('(', '').replace(')', '')
Out[10]: '1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717'

Or if you are using python 2.x as a more efficient approach use str.translate() method:

In [9]: s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'

In [10]: s.translate(None, '()')
Out[10]: '1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717'

In python 3.x:

In [18]: import string
In [19]: s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'

In [20]: s.translate(string.maketrans('', ''), '()')
Out[20]: '1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717'

Upvotes: 2

Laurent LAPORTE
Laurent LAPORTE

Reputation: 22952

If you want to remove the parenthesis, you can use a simple RegEx:

import re

line = "(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717"
print(re.sub(r"[()]", "", line))

You get:

1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717

Upvotes: 2

Uriel
Uriel

Reputation: 16184

You need to group using a formatted replacement sting with the format of \g<group_number>:

>>> s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'
>>> import re
>>> re.sub(r'\(([\d\.]+)\)', '\g<1>', s)
'1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717'

Also, that is probably the regex you want (as I added in the code):

\(([\d\.]+)\)

Translated to english, it means one char of (, group containing one or more digits with possible . between, then another closing ).

Upvotes: 1

Related Questions