Reputation: 1795
How can i get rid of these u
in output?
Regex:
Tregex1 = "1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})(\se?x?t?(\d*))?"
Code:
for a in re.findall(Tregex1,text_value,re.IGNORECASE):
print a
Output:
(u'877', u'638', u'7848', u'\n', u'')
(u'650', u'627', u'1000', u'\n', u'')
(u'650', u'627', u'1001', u'\nE', u'')
(u'312', u'273', u'4100', u'', u'')
I tried using these & followed several similar links
a.encode('ascii', 'ignore')
a.encode('utf-8')
",".join(a)
But none of them are working.
Expected Output:
877-638-7848
650-627-1000
650-627-1001
312-273-4100
I am using Python 2.7
Also can someone explain, why i am getting sometimes \n while \nE otherwise or even blank?
Upvotes: 0
Views: 220
Reputation: 504
try this:
for a in re.findall(Tregex1,text_value,re.IGNORECASE):
print '-'.join(a[:3])
the u just tells you that it's a unicode string.
the (..., ...,) is the representation of the tuples
what '-'.join(...) does is connect the strings of ... with a -
a[:3] means "only the first three elements of a"
(for a good explanation of the slicing notation in python look here: https://stackoverflow.com/a/509295/327293)
Upvotes: 2
Reputation: 1227
The u just means it is unicode. You can recode it as you wish. This will work, and also skip the blank values:
a = (u'877', u'638', u'7848', u'\n', u'')
print "-".join([x.strip() for x in a if x.strip() != u""])
877-638-7848
Upvotes: 1
Reputation: 599560
Your problem is not the u
. If you want to format your results in a specific way, you should use the string formatting functions.
print '-'.join(a)
Upvotes: 1