Python List splitting, sorting by date, then joining

Question

ok, I've been at this one for hours I admit defeat and beg for your mercy.

goal: I have multiple files (bank statement downloads), and I want to Merge, sort, remove duplicates.

the downloads are in this format:

"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"

one of my prime issues, aside from total ignorance of what I'm doing, is that the numerical values contain commas. I've written, successfully, code that strips such 'buried' commas out, I then strip the quotes so that I have a CSV...line.

so I now have my data in this format

['02/03/2015', ' 	 ', '241.25	 ', ' 	 ', 'Transaction Details
', '02/03/2015', ' 	 ', ' 	 ', '-155.49	 ', 'Transaction Details
', '03/03/2015', ' 	 ', '82.00	 ', ' 	 ', 'Transaction Details
', '03/03/2015', ' 	 ', '243.25	 ', ' 	 ', 'Transaction Details
', '02/03/2015', ' 	 ', '241.25	 ', ' 	 ', 'Transaction Details
']

which I believe makes it nearly ready to do a sort on first the element, but I think it's now one long list, instead of a list of lists.

I researched sorts and found the lambda...function, so I started to implement

new_file_data = sorted(new_file_data, key=lambda item: item[0])

but element [0] was just the " at the BOL.

I also noted that I needed to instruct that the date was not in, possibly, the correct format, which led me to this construct:

sorted(new_file_data, key=lambda d: datetime.strptime(d, '%d/%m/%Y'))

I get, loosely, the 'map' construct but not how to combine such that I can just reference element[0] as well as how to reference it (datewise)

and now I'm here, hopefully someone could push me over this hurdle? I think I need to [have] split the list better to start with so each line is an element - I did at one point get a sorted result but all the fields got globbed together, values (sorted) then dates then words etc

So if anyone could offer some advice on my failed list manipulation and how to structure that sort-lambda.

thanks to those who have the time and know how to respond to such starter queries.

RickardSjogren · Accepted Answer

If I understand correctly you want to read the contents of the csv and sort them by date.

Given the contents of data.csv

"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"

I would use the csv-module to read the data.

import csv
with open('data.csv') as f:
    data = [row for row in csv.reader(f)]

Which gives:

>>> data
[['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
 ['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14'],
 ['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
 ['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
 ['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
 ['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
 ['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
 ['04/03/2015', '-25.05', 'Transaction description', '2,314.07']]

Then you can use the datetime-module to provide a key for sorting.

import datetime 
sorted_data = sorted(data, key=lambda row: datetime.datetime.strptime(row[0], "%d/%m/%Y"))

Which gives:

>>> sorted_data
[['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
 ['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
 ['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
 ['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
 ['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
 ['04/03/2015', '-25.05', 'Transaction description', '2,314.07'],
 ['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
 ['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14']]

Python List splitting, sorting by date, then joining

Answers (2)

Related Questions