Reputation: 5922
I have this string, input from a webpage.
s = "[u'967208', u'411600', u'460273']"
I want to remove the brackets [ ]
and u
and '
.
I would also like to make new line breaks
instead of the commas ,
.
I have spent much time searching for a solution, including encoding and regex, but I can't seem to get it working.
Updated: This is what I use to retrieve the string:
import selenium
import re
input = webdriver.find_element_by_class_name("class_name")
s = re.findall("((?<=\()[0-9]*)", input.text)
Upvotes: 1
Views: 864
Reputation: 180401
If you just want the digits with re
just use \d+
:
import re
s = "[u'967208', u'411600', u'460273']"
print "\n".join(re.findall(r"\d+", s))
967208
411600
460273
It is safe and efficient:
In [7]: timeit "\n".join(literal_eval(s))
100000 loops, best of 3: 11.7 µs per loop
In [8]: r = re.compile(r"\d+")
In [9]: timeit "\n".join(r.findall(s))
1000000 loops, best of 3: 1.35 µs per loop
If your goal is to write each string to a file, you can use the csv module to write the list of strings returned from re.findall, using a newline as the delimiter:
s = u"[u'967208', u'411600', u'460273']"
import csv
import re
with open("out.csv","w") as out:
wr = csv.writer(out,delimiter="\n")
r = re.compile("\d+")
wr.writerow(r.findall(s))
Output:
967208
411600
460273
If you have many strings just iterate calling call r.findall and pass the result to writerow.
I think after the comments the mystery is solved, you had a list of digits all along that was returned from your regex using findall so you can do the following:
u"abc(967208) \n abc2(411600) \n abc3(460273)" # input.text
import csv
import re
with open("out.csv","w") as out:
wr = csv.writer(out,delimiter="\n")
r = re.compile("\((\d+)\)")
wr.writerow(r.findall(input.text))
\((\d+)\)
will find 1+ digits inside parens.
Upvotes: 2
Reputation: 49318
>>> import ast
>>> s = "[u'967208', u'411600', u'460273']"
>>> a = ast.literal_eval(s)
>>> print(*a, sep='\n')
967208
411600
460273
Upvotes: 6