David542
David542

Reputation: 110153

Regex to help split up list into two-tuples

Given a list of actors, with their their character name in brackets, separated by either a semi-colon (;) or comm (,):

Shelley Winters [Ruby]; Millicent Martin [Siddie]; Julia Foster [Gilda]; 
Jane Asher [Annie]; Shirley Ann Field [Carla]; Vivien Merchant [Lily]; 
Eleanor Bron [Woman Doctor], Denholm Elliott [Mr. Smith; abortionist]; 
Alfie Bass [Harry]

How would I parse this into a list of two-typles in the form of [(actor, character),...]

--> [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'), 
     ('Denholm Elliott', 'Mr. Smith; abortionist')]

I originally had:

actors = [item.strip().rstrip(']') for item in re.split('\[|,|;',data['actors'])]
data['actors'] = [(actors[i], actors[i + 1]) for i in range(0, len(actors), 2)]

But this doesn't quite work, as it also splits up items within brackets.

Upvotes: 2

Views: 141

Answers (2)

JBernardo
JBernardo

Reputation: 33397

You can go with something like:

>>> re.findall(r'(\w[\w\s\.]+?)\s*\[([\w\s;\.,]+)\][,;\s$]*', s)
[('Shelley Winters', 'Ruby'),
 ('Millicent Martin', 'Siddie'),
 ('Julia Foster', 'Gilda'),
 ('Jane Asher', 'Annie'),
 ('Shirley Ann Field', 'Carla'),
 ('Vivien Merchant', 'Lily'),
 ('Eleanor Bron', 'Woman Doctor'),
 ('Denholm Elliott', 'Mr. Smith; abortionist'),
 ('Alfie Bass', 'Harry')]

One can also simplify some things with .*?:

re.findall(r'(\w.*?)\s*\[(.*?)\][,;\s$]*', s)

Upvotes: 4

BlackVegetable
BlackVegetable

Reputation: 13044

inputData = inputData.replace("];", "\n")
inputData = inputData.replace("],", "\n")
inputData = inputData[:-1]
for line in inputData.split("\n"):
    actorList.append(line.partition("[")[0])
    dataList.append(line.partition("[")[2])
togetherList = zip(actorList, dataList)

This is a bit of a hack, and I'm sure you can clean it up from here. I'll walk through this approach just to make sure you understand what I'm doing.

I am replacing both the ; and the , with a newline, which I will later use to split up every pair into its own line. Assuming your content isn't filled with erroneous ]; or ], 's this should work. However, you'll notice the last line will have a ] at the end because it didn't have a need a comma or semi-colon. Thus, I splice it off with the third line.

Then, just using the partition function on each line that we created within your input string, we assign the left part to the actor list, the right part to the data list and ignore the bracket (which is at position 1).

After that, Python's very useful zip funciton should finish the job for us by associating the ith element of each list together into a list of matched tuples.

Upvotes: 1

Related Questions