Daniel Romero
Daniel Romero

Reputation: 317

Extracting multiple string values of variable length before and after a delimiter in a list

I have several Python lists in the following format:

rating = ['What is your rating for?: Bob', 'What is your rating for?: Alice', 'What is your rating for?: Mary Jane']

opinion = ['What is your opinion of?: Bob', 'What is your opinion of?: Alice', 'What is your opinion of?: Mary Jane']

I am trying to write a function that will evaluate a given list and generate two data structures from it:

  1. a list of the names that appear after the colons (:)
  2. a string variable that has the text that is repeated before the colons (:)

Ideally, both items would be named based off of the original list name. Also, the delimiter and the first space after it should be ignored.

Desired sample output for the two above examples:

rating_names = ['Bob', 'Alice', 'Mary Jane']
rating_text = 'What is your rating for?'

opinion_names = ['Bob', 'Alice', 'Mary Jane']
opinion_text = 'What is your opinion of?'

I've been able to make this work for a single list by removing a fixed string from each list item, but haven't quite figured out how to make it work for a variable number of characters before the delimiter and the potential of a two word name (e.g. 'Mary Jane') after it.

rating_names = ([s.replace('What is your rating for?': ','') for s in rating])

After searching, it appears that a regular expression like look-ahead (1, 2) might be the solution, but I can't get that to work, either.

Upvotes: 0

Views: 246

Answers (3)

Elazar
Elazar

Reputation: 21635

use str.split():

>>> 'What is your rating for?: Bob'.split(': ')
['What is your rating for?', 'Bob']

to get the text and names:

>>> def get_text_name(arg):
...     temp = [x.split(': ') for x in arg]
...     return temp[0][0], [t[1] for t in temp]
...
>>> rating_text, rating_names = get_text_name(rating)
>>> rating_text
'What is your rating for?'
>>> rating_names
['Bob', 'Alice', 'Mary Jane']

to get "variables" (you probably mean "dict", as have been said here):

>>> def get_text_name(arg):
...     temp = [x.split(': ') for x in arg]
...     return temp[0][0].split()[-2], [t[1] for t in temp]
... 
>>> text_to_name=dict([get_text_name(x) for x in [rating, opinion]])
>>> text_to_name
{'rating': ['Bob', 'Alice', 'Mary Jane'], 'opinion': ['Bob', 'Alice', 'Mary Jane']}

Upvotes: 1

Adam Lewis
Adam Lewis

Reputation: 7247

If you have a large number lists to process you may consider putting the data directly into a dictionary. This might help address you question to Elazar.

Code

def dict_gen(d, l):
    for s in l:
        question, name = s.split(': ')
        if question not in d:
            d[question] = []    
        d[question].append(name)

Usage

rating = ['What is your rating for?: Bob', 'What is your rating for?: Alice', 'What is your rating for?: Mary Jane']
opinion = ['What is your opinion of?: Bob', 'What is your opinion of?: Alice', 'What is your opinion of?: Mary Jane']

results = {}
dict_gen(results, rating)
dict_gen(results, opinion)

for key, value in results.items():
    print key, value

Yields

What is your rating for? ['Bob', 'Alice', 'Mary Jane']
What is your opinion of? ['Bob', 'Alice', 'Mary Jane']

Upvotes: 0

perreal
perreal

Reputation: 98078

import re
def gr(l):
    dq, ds = dict(), dict()
    for t in l:
        for q,s in re.findall("(.*\?)\s*:\s*(.*)$", t): dq[q] = ds[s] = 1 
    return dq.keys(), ds.keys()

l = [ gr(rating), gr(opinion) ]
print l

Upvotes: 1

Related Questions