Reputation: 1817
I'm currently trying to parse a python string for some specific text inside of it. It should actually be really straightforward.
But more importantly, I want to know if regex is a "tool set" type thing, where you know a certain number of tricks? Some people are very, very proficient with them, and I want to attain that proficiency.
So while I am asking how to match this string, I'd like an explanation of your thought process as you went through as you came to your solution
I basically want text A, text-B, and text_C, delimited only by commas.
The desired output string:
"text A,text-B,text_C"
The original text is as follows:
"(1, u'text A', u'text-B', u'text_C')"
In my limited understand, I understand that the main thing separating each expression is a single-quote, so I would start with that. But ultimately I might have strings such as text-'A
and I want to make sure that I don't run into errors because I parse the string incorrectly.
Thanks for your time. Remember: thought process.
Upvotes: 0
Views: 305
Reputation: 142106
Since the string you're dealing with is a repr version of a Python tuple, the most Pythonic way is to use ast.literal_eval
which can take that object and safely convert back to a Python object retaining the correct types:
import ast
text = "(1, u'text A', u'text-B', u'text_C')"
tup = ast.literal_eval(text)
Then if you only wish to join each item that's a string together:
joined = ', '.join(el for el in tup if isinstance(el, basestring))
# text A, text-B, text_C
Otherwise just slice the tuple tup[1:]
and join the items in that...
In terms of a regex, a quick and dirty, non-robust method, that will break easily and possibly even provide incorrect matches under some circumstances is to use:
import re
string_vals = re.findall("'(.*?)'", text)
This finds anything after a '
up until the very next '
... Again, using ast.literal_eval
is much nicer here...
Upvotes: 3
Reputation: 2828
Must it be regex? :(
a_str = "(1, u'text A', u'text-B', u'text_C')"
print ",".join(a_str[1:-1].split(",")[1:]).replace('u','').replace("'",'')
Yields:
text A, text-B, text_C
EDIT: well if it must be regex .. don't mind this post, it doesn't work for many cases.
Upvotes: 0