Reputation: 110153
Given a list of actors, with their their character name in brackets, separated by either a semi-colon (;) or comm (,):
Shelley Winters [Ruby]; Millicent Martin [Siddie]; Julia Foster [Gilda];
Jane Asher [Annie]; Shirley Ann Field [Carla]; Vivien Merchant [Lily];
Eleanor Bron [Woman Doctor], Denholm Elliott [Mr. Smith; abortionist];
Alfie Bass [Harry]
How would I parse this into a list of two-typles in the form of [(actor, character),...]
--> [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'),
('Denholm Elliott', 'Mr. Smith; abortionist')]
I originally had:
actors = [item.strip().rstrip(']') for item in re.split('\[|,|;',data['actors'])]
data['actors'] = [(actors[i], actors[i + 1]) for i in range(0, len(actors), 2)]
But this doesn't quite work, as it also splits up items within brackets.
Upvotes: 2
Views: 141
Reputation: 33397
You can go with something like:
>>> re.findall(r'(\w[\w\s\.]+?)\s*\[([\w\s;\.,]+)\][,;\s$]*', s)
[('Shelley Winters', 'Ruby'),
('Millicent Martin', 'Siddie'),
('Julia Foster', 'Gilda'),
('Jane Asher', 'Annie'),
('Shirley Ann Field', 'Carla'),
('Vivien Merchant', 'Lily'),
('Eleanor Bron', 'Woman Doctor'),
('Denholm Elliott', 'Mr. Smith; abortionist'),
('Alfie Bass', 'Harry')]
One can also simplify some things with .*?
:
re.findall(r'(\w.*?)\s*\[(.*?)\][,;\s$]*', s)
Upvotes: 4
Reputation: 13044
inputData = inputData.replace("];", "\n")
inputData = inputData.replace("],", "\n")
inputData = inputData[:-1]
for line in inputData.split("\n"):
actorList.append(line.partition("[")[0])
dataList.append(line.partition("[")[2])
togetherList = zip(actorList, dataList)
This is a bit of a hack, and I'm sure you can clean it up from here. I'll walk through this approach just to make sure you understand what I'm doing.
I am replacing both the ;
and the ,
with a newline, which I will later use to split up every pair into its own line. Assuming your content isn't filled with erroneous ];
or ],
's this should work. However, you'll notice the last line will have a ]
at the end because it didn't have a need a comma or semi-colon. Thus, I splice it off with the third line.
Then, just using the partition function on each line that we created within your input string, we assign the left part to the actor list, the right part to the data list and ignore the bracket (which is at position 1).
After that, Python's very useful zip funciton should finish the job for us by associating the i
th element of each list together into a list of matched tuples.
Upvotes: 1