tumultous_rooster
tumultous_rooster

Reputation: 12550

Traversing a list of lists by index within a loop, to reformat strings

I have a list of lists that looks like this, that was pulled in from a poorly formatted csv file:

DF = [['Customer Number: 001 '],
 ['Notes: Bought a ton of stuff and was easy to deal with'],
 ['Customer Number: 666 '],
 ['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
 ['Customer Number: 103 '],
 ['Notes: bought a ton of stuff got a free keychain'],
 ['Notes: gave us a referral to his uncles cousins hairdresser'],
 ['Notes: name address birthday social security number on file'],
 ['Customer Number: 007 '],
 ['Notes: looked a lot like James Bond'],
 ['Notes: came in with a martini']]

I would like to end up with a new structure like this:

['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

after which I can further split, strip, etc.

So, I used the facts that:

to code up what is clearly an absurd solution, even though it works.

DF = [item for sublist in DF for item in sublist]
DF = DF + ['stophere']
DF2 = []

for record in DF:
    if (record[0:17]=="Customer Number: ") & (record !="stophere"):
        DF2.append(record + DF[DF.index(record)+1])
        if len(DF[DF.index(record)+2]) >21:
            DF2.append(record + DF[DF.index(record)+2])
            if len(DF[DF.index(record)+3]) >21:
                DF2.append(record + DF[DF.index(record)+3])
                if len(DF[DF.index(record)+4]) >21:
                    DF2.append(record + DF[DF.index(record)+4])
                    if len(DF[DF.index(record)+5]) >21:
                        DF2.append(record + DF[DF.index(record)+5])

Would anyone mind recommending a more stable and intelligent solution to this kind of problem?

Upvotes: 16

Views: 511

Answers (6)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

Just keep track of when we find a new customer:

from pprint import pprint as pp

out = []
for sub in DF:
    if sub[0].startswith("Customer Number"):
        cust = sub[0]
    else:
        out.append(cust + sub[0])
pp(out)

Output:

['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
 'hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
 'hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number '
 'on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

If the customer can repeat again later and you want them grouped together use a dict:

from collections import defaultdict
d = defaultdict(list)
for sub in DF:
    if sub[0].startswith("Customer Number"):
        cust = sub[0]
    else:
        d[cust].append(cust + sub[0])
print(d)

Output:

pp(d)

{'Customer Number: 001 ': ['Customer Number: 001 Notes: Bought a ton of '
                           'stuff and was easy to deal with'],
 'Customer Number: 007 ': ['Customer Number: 007 Notes: looked a lot like '
                           'James Bond',
                           'Customer Number: 007 Notes: came in with a '
                           'martini'],
 'Customer Number: 103 ': ['Customer Number: 103 Notes: bought a ton of '
                           'stuff got a free keychain',
                           'Customer Number: 103 Notes: gave us a referral '
                           'to his uncles cousins hairdresser',
                           'Customer Number: 103 Notes: name address '
                           'birthday social security number on file'],
 'Customer Number: 666 ': ['Customer Number: 666 Notes: acted and looked '
                           'like Chris Farley on that hidden decaf skit '
                           'from SNL']}

Based on your comment and error you seem to have lines coming before an actual customer so we can add them to the first customer in the list:

# added ["foo"] before we see any customer

DF = [["foo"],['Customer Number: 001 '],
 ['Notes: Bought a ton of stuff and was easy to deal with'],
 ['Customer Number: 666 '],
 ['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
 ['Customer Number: 103 '],
 ['Notes: bought a ton of stuff got a free keychain'],
 ['Notes: gave us a referral to his uncles cousins hairdresser'],
 ['Notes: name address birthday social security number on file'],
 ['Customer Number: 007 '],
 ['Notes: looked a lot like James Bond'],
 ['Notes: came in with a martini']]


from pprint import pprint as pp

from itertools import takewhile, islice

# find lines up to first customer
start = list(takewhile(lambda x: "Customer Number:" not in x[0], DF))

out = []
ln = len(start)
# if we had data before we actually found a customer this will be True
if start: 
    # so set cust to first customer in list and start adding to out
    cust = DF[ln][0]
    for sub in start:
        out.append(cust + sub[0])
# ln will either be 0 if start is empty else we start at first customer
for sub in islice(DF, ln, None):
    if sub[0].startswith("Customer Number"):
        cust = sub[0]
    else:
        out.append(cust + sub[0])

Which outputs:

 ['Customer Number: 001 foo',
 'Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
 'hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
 'hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number '
 'on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

I presumed you would consider lines that come before any customer to actually belong to that first customer.

Upvotes: 13

Marcin
Marcin

Reputation: 238111

You can also use OrderedDict, where keys are customers and values is a list of notes:

from collections import OrderedDict

DF_dict = OrderedDict()

for subl in DF:
    if 'Customer Number' in subl[0]:  
        DF_dict[subl[0]] = []
        continue    
    last_key = list(DF_dict.keys())[-1]
    DF_dict[last_key].append(subl[0])


for customer, notes in  DF_dict.items():
    for a_note in notes:
        print(customer,a_note)

Results in:

Customer Number: 001  Notes: Bought a ton of stuff and was easy to deal with
Customer Number: 666  Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL
Customer Number: 103  Notes: bought a ton of stuff got a free keychain
Customer Number: 103  Notes: gave us a referral to his uncles cousins hairdresser
Customer Number: 103  Notes: name address birthday social security number on file
Customer Number: 007  Notes: looked a lot like James Bond
Customer Number: 007  Notes: came in with a martini

Putting values in a dict like this, can be useful if you want to calculate how many notes are for a given customer, count the notes, or just select notes for a given customer.

Alternative, without getting calling list(DF_dict.keys())[-1] in each iteration :

last_key = ''

for subl in DF:
    if 'Customer Number' in subl[0]:  
        DF_dict[subl[0]] = []
        last_key = subl[0]
        continue    

    DF_dict[last_key].append(subl[0])

And new shorter version, using defaultdict:

from collections import defaultdict

DF_dict = defaultdict(list)

for subl in DF:
    if 'Customer Number' in subl[0]:         
        customer = subl[0]
        continue        

    DF_dict[customer].append(subl[0])

Upvotes: 3

thefourtheye
thefourtheye

Reputation: 239443

Your basic objective is to group the notes and associate it with the customer. And since the list is already sorted, you can simply use itertools.groupby, like this

from itertools import groupby, chain

def build_notes(it):
    customer, func = "", lambda x: x.startswith('Customer')
    for item, grp in groupby(chain.from_iterable(DF), key=func):
        if item:
            customer = next(grp)
        else:
            for note in grp:
                yield customer + note
            # In Python 3.x, you can simply do
            # yield from (customer + note for note in grp)

Here, we flatten the actual list of lists to a sequence of strings, with chain.from_iterable. And then we group the lines which have Customer in it and the lines which don't. If the line has Customer, then item will be True otherwise False. If item is True, then we get the customer information and when the item is False, we iterate over the grouped notes and return one string at a time by concatenating the customer information with the notes.

So, when you run the code,

print(list(build_notes(DF)))

you get

['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

Upvotes: 4

Brobin
Brobin

Reputation: 3326

As long as you can count on the first element being a customer, you can do it like this.

Simply loop through each item. If the item is a customer, set the current customer as that string. Else, it is a note, so you append the customer and the note to the list of results.

customer = ""
results = []
for record in DF:
    data = record[0]
    if "Customer" in data:
        customer = data
    elif "Notes" in data:
        result = customer + data
        results.append(result)

print(results)

Upvotes: 2

Paul Rooney
Paul Rooney

Reputation: 21609

DF = [['Customer Number: 001 '],
 ['Notes: Bought a ton of stuff and was easy to deal with'],
 ['Customer Number: 666 '],
 ['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
 ['Customer Number: 103 '],
 ['Notes: bought a ton of stuff got a free keychain'],
 ['Notes: gave us a referral to his uncles cousins hairdresser'],
 ['Notes: name address birthday social security number on file'],
 ['Customer Number: 007 '],
 ['Notes: looked a lot like James Bond'],
 ['Notes: came in with a martini']]

custnumstr = None
out = []
for df in DF:
     if df[0].startswith('Customer Number'):
         custnumstr = df[0]
     else:
         out.append(custnumstr + df[0])

for e in out:
    print e

Upvotes: 3

ProfOak
ProfOak

Reputation: 551

As long as the format is the same as your example, this should work.

final_list = []
for outer_list in DF:
    for s in outer_list:
        if s.startswith("Customer"):
            cust = s
        elif s.startswith("Notes"):
            final_list.append(cust + s)

for f in final_list:
    print f

Upvotes: 2

Related Questions