Reputation: 12550
I have a list of lists that looks like this, that was pulled in from a poorly formatted csv file:
DF = [['Customer Number: 001 '],
['Notes: Bought a ton of stuff and was easy to deal with'],
['Customer Number: 666 '],
['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
['Customer Number: 103 '],
['Notes: bought a ton of stuff got a free keychain'],
['Notes: gave us a referral to his uncles cousins hairdresser'],
['Notes: name address birthday social security number on file'],
['Customer Number: 007 '],
['Notes: looked a lot like James Bond'],
['Notes: came in with a martini']]
I would like to end up with a new structure like this:
['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
'Customer Number: 103 Notes: name address birthday social security number on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
after which I can further split, strip, etc.
So, I used the facts that:
Customer Number
Notes
are always longerNotes
never exceeds 5to code up what is clearly an absurd solution, even though it works.
DF = [item for sublist in DF for item in sublist]
DF = DF + ['stophere']
DF2 = []
for record in DF:
if (record[0:17]=="Customer Number: ") & (record !="stophere"):
DF2.append(record + DF[DF.index(record)+1])
if len(DF[DF.index(record)+2]) >21:
DF2.append(record + DF[DF.index(record)+2])
if len(DF[DF.index(record)+3]) >21:
DF2.append(record + DF[DF.index(record)+3])
if len(DF[DF.index(record)+4]) >21:
DF2.append(record + DF[DF.index(record)+4])
if len(DF[DF.index(record)+5]) >21:
DF2.append(record + DF[DF.index(record)+5])
Would anyone mind recommending a more stable and intelligent solution to this kind of problem?
Upvotes: 16
Views: 511
Reputation: 180391
Just keep track of when we find a new customer:
from pprint import pprint as pp
out = []
for sub in DF:
if sub[0].startswith("Customer Number"):
cust = sub[0]
else:
out.append(cust + sub[0])
pp(out)
Output:
['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
'hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
'hairdresser',
'Customer Number: 103 Notes: name address birthday social security number '
'on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
If the customer can repeat again later and you want them grouped together use a dict:
from collections import defaultdict
d = defaultdict(list)
for sub in DF:
if sub[0].startswith("Customer Number"):
cust = sub[0]
else:
d[cust].append(cust + sub[0])
print(d)
Output:
pp(d)
{'Customer Number: 001 ': ['Customer Number: 001 Notes: Bought a ton of '
'stuff and was easy to deal with'],
'Customer Number: 007 ': ['Customer Number: 007 Notes: looked a lot like '
'James Bond',
'Customer Number: 007 Notes: came in with a '
'martini'],
'Customer Number: 103 ': ['Customer Number: 103 Notes: bought a ton of '
'stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral '
'to his uncles cousins hairdresser',
'Customer Number: 103 Notes: name address '
'birthday social security number on file'],
'Customer Number: 666 ': ['Customer Number: 666 Notes: acted and looked '
'like Chris Farley on that hidden decaf skit '
'from SNL']}
Based on your comment and error you seem to have lines coming before an actual customer so we can add them to the first customer in the list:
# added ["foo"] before we see any customer
DF = [["foo"],['Customer Number: 001 '],
['Notes: Bought a ton of stuff and was easy to deal with'],
['Customer Number: 666 '],
['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
['Customer Number: 103 '],
['Notes: bought a ton of stuff got a free keychain'],
['Notes: gave us a referral to his uncles cousins hairdresser'],
['Notes: name address birthday social security number on file'],
['Customer Number: 007 '],
['Notes: looked a lot like James Bond'],
['Notes: came in with a martini']]
from pprint import pprint as pp
from itertools import takewhile, islice
# find lines up to first customer
start = list(takewhile(lambda x: "Customer Number:" not in x[0], DF))
out = []
ln = len(start)
# if we had data before we actually found a customer this will be True
if start:
# so set cust to first customer in list and start adding to out
cust = DF[ln][0]
for sub in start:
out.append(cust + sub[0])
# ln will either be 0 if start is empty else we start at first customer
for sub in islice(DF, ln, None):
if sub[0].startswith("Customer Number"):
cust = sub[0]
else:
out.append(cust + sub[0])
Which outputs:
['Customer Number: 001 foo',
'Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
'hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
'hairdresser',
'Customer Number: 103 Notes: name address birthday social security number '
'on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
I presumed you would consider lines that come before any customer to actually belong to that first customer.
Upvotes: 13
Reputation: 238111
You can also use OrderedDict, where keys are customers and values is a list of notes:
from collections import OrderedDict
DF_dict = OrderedDict()
for subl in DF:
if 'Customer Number' in subl[0]:
DF_dict[subl[0]] = []
continue
last_key = list(DF_dict.keys())[-1]
DF_dict[last_key].append(subl[0])
for customer, notes in DF_dict.items():
for a_note in notes:
print(customer,a_note)
Results in:
Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with
Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL
Customer Number: 103 Notes: bought a ton of stuff got a free keychain
Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser
Customer Number: 103 Notes: name address birthday social security number on file
Customer Number: 007 Notes: looked a lot like James Bond
Customer Number: 007 Notes: came in with a martini
Putting values in a dict like this, can be useful if you want to calculate how many notes are for a given customer, count the notes, or just select notes for a given customer.
Alternative, without getting calling list(DF_dict.keys())[-1]
in each iteration :
last_key = ''
for subl in DF:
if 'Customer Number' in subl[0]:
DF_dict[subl[0]] = []
last_key = subl[0]
continue
DF_dict[last_key].append(subl[0])
And new shorter version, using defaultdict:
from collections import defaultdict
DF_dict = defaultdict(list)
for subl in DF:
if 'Customer Number' in subl[0]:
customer = subl[0]
continue
DF_dict[customer].append(subl[0])
Upvotes: 3
Reputation: 239443
Your basic objective is to group the notes and associate it with the customer. And since the list is already sorted, you can simply use itertools.groupby
, like this
from itertools import groupby, chain
def build_notes(it):
customer, func = "", lambda x: x.startswith('Customer')
for item, grp in groupby(chain.from_iterable(DF), key=func):
if item:
customer = next(grp)
else:
for note in grp:
yield customer + note
# In Python 3.x, you can simply do
# yield from (customer + note for note in grp)
Here, we flatten the actual list of lists to a sequence of strings, with chain.from_iterable
. And then we group the lines which have Customer
in it and the lines which don't. If the line has Customer
, then item
will be True
otherwise False
. If item
is True
, then we get the customer information and when the item
is False
, we iterate over the grouped notes and return one string at a time by concatenating the customer information with the notes.
So, when you run the code,
print(list(build_notes(DF)))
you get
['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
'Customer Number: 103 Notes: name address birthday social security number on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
Upvotes: 4
Reputation: 3326
As long as you can count on the first element being a customer, you can do it like this.
Simply loop through each item. If the item is a customer, set the current customer as that string. Else, it is a note, so you append the customer and the note to the list of results.
customer = ""
results = []
for record in DF:
data = record[0]
if "Customer" in data:
customer = data
elif "Notes" in data:
result = customer + data
results.append(result)
print(results)
Upvotes: 2
Reputation: 21609
DF = [['Customer Number: 001 '],
['Notes: Bought a ton of stuff and was easy to deal with'],
['Customer Number: 666 '],
['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
['Customer Number: 103 '],
['Notes: bought a ton of stuff got a free keychain'],
['Notes: gave us a referral to his uncles cousins hairdresser'],
['Notes: name address birthday social security number on file'],
['Customer Number: 007 '],
['Notes: looked a lot like James Bond'],
['Notes: came in with a martini']]
custnumstr = None
out = []
for df in DF:
if df[0].startswith('Customer Number'):
custnumstr = df[0]
else:
out.append(custnumstr + df[0])
for e in out:
print e
Upvotes: 3
Reputation: 551
As long as the format is the same as your example, this should work.
final_list = []
for outer_list in DF:
for s in outer_list:
if s.startswith("Customer"):
cust = s
elif s.startswith("Notes"):
final_list.append(cust + s)
for f in final_list:
print f
Upvotes: 2