jdero
jdero

Reputation: 1817

Best way to parse string with delimiters (thinking regex)?

I'm currently trying to parse a python string for some specific text inside of it. It should actually be really straightforward.

But more importantly, I want to know if regex is a "tool set" type thing, where you know a certain number of tricks? Some people are very, very proficient with them, and I want to attain that proficiency.

So while I am asking how to match this string, I'd like an explanation of your thought process as you went through as you came to your solution

I basically want text A, text-B, and text_C, delimited only by commas.

The desired output string:

"text A,text-B,text_C"

The original text is as follows:

"(1, u'text A', u'text-B', u'text_C')"

In my limited understand, I understand that the main thing separating each expression is a single-quote, so I would start with that. But ultimately I might have strings such as text-'A and I want to make sure that I don't run into errors because I parse the string incorrectly.

Thanks for your time. Remember: thought process.

Upvotes: 0

Views: 305

Answers (2)

Jon Clements
Jon Clements

Reputation: 142106

Since the string you're dealing with is a repr version of a Python tuple, the most Pythonic way is to use ast.literal_eval which can take that object and safely convert back to a Python object retaining the correct types:

import ast
text = "(1, u'text A', u'text-B', u'text_C')"
tup = ast.literal_eval(text)

Then if you only wish to join each item that's a string together:

joined = ', '.join(el for el in tup if isinstance(el, basestring))
# text A, text-B, text_C

Otherwise just slice the tuple tup[1:] and join the items in that...

In terms of a regex, a quick and dirty, non-robust method, that will break easily and possibly even provide incorrect matches under some circumstances is to use:

import re
string_vals = re.findall("'(.*?)'", text)

This finds anything after a ' up until the very next '... Again, using ast.literal_eval is much nicer here...

Upvotes: 3

sihrc
sihrc

Reputation: 2828

Must it be regex? :(

a_str = "(1, u'text A', u'text-B', u'text_C')"
print ",".join(a_str[1:-1].split(",")[1:]).replace('u','').replace("'",'')

Yields:

text A, text-B, text_C

EDIT: well if it must be regex .. don't mind this post, it doesn't work for many cases.

Upvotes: 0

Related Questions