Reputation: 903
I would like to replace some characters in lines of strings. Three are thousands of these lines in a data frame.
Example of string:
(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717
My code that didn't work:
for line in dat:
line.strip().split("\t")
line = sub(r'((\.+))',\2, line)
print line
The output that I want:
1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717
Upvotes: 0
Views: 70
Reputation: 19806
A simple approach can be using split()
and strip()
functions.
We split our string to get a list of words, then we use strip
to remove '(' and ')' at the beginning / end of each word, join()
is then applied to the result to get the desired string.
A generator expression is used to loop over our list of words:
s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'
res = ' '.join(item.strip('()') for item in s.split(' '))
print(res) # Output: 1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717
Upvotes: 2
Reputation: 107297
Since you just want to remove the parenthesises I suggest to use two replace()
instead of regex()
:
In [9]: s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'
In [10]: s.replace('(', '').replace(')', '')
Out[10]: '1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717'
Or if you are using python 2.x as a more efficient approach use str.translate()
method:
In [9]: s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'
In [10]: s.translate(None, '()')
Out[10]: '1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717'
In python 3.x:
In [18]: import string
In [19]: s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'
In [20]: s.translate(string.maketrans('', ''), '()')
Out[20]: '1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717'
Upvotes: 2
Reputation: 22952
If you want to remove the parenthesis, you can use a simple RegEx:
import re
line = "(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717"
print(re.sub(r"[()]", "", line))
You get:
1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717
Upvotes: 2
Reputation: 16184
You need to group using a formatted replacement sting with the format of \g<group_number>
:
>>> s = '(1) W00001 + (0.5) Q00003 <=> (1.7227) U00002 + (4) X21717'
>>> import re
>>> re.sub(r'\(([\d\.]+)\)', '\g<1>', s)
'1 W00001 + 0.5 Q00003 <=> 1.7227 U00002 + 4 X21717'
Also, that is probably the regex you want (as I added in the code):
\(([\d\.]+)\)
Translated to english, it means one char of (
, group containing one or more digits with possible .
between, then another closing )
.
Upvotes: 1