Reputation: 191
ok, I've been at this one for hours I admit defeat and beg for your mercy.
goal: I have multiple files (bank statement downloads), and I want to Merge, sort, remove duplicates.
the downloads are in this format:
"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"
one of my prime issues, aside from total ignorance of what I'm doing, is that the numerical values contain commas. I've written, successfully, code that strips such 'buried' commas out, I then strip the quotes so that I have a CSV...line.
so I now have my data in this format
['02/03/2015', ' \t ', '241.25\t ', ' \t ', 'Transaction Details\n', '02/03/2015', ' \t ', ' \t ', '-155.49\t ', 'Transaction Details\n', '03/03/2015', ' \t ', '82.00\t ', ' \t ', 'Transaction Details\n', '03/03/2015', ' \t ', '243.25\t ', ' \t ', 'Transaction Details\n', '02/03/2015', ' \t ', '241.25\t ', ' \t ', 'Transaction Details\n']
which I believe makes it nearly ready to do a sort on first the element, but I think it's now one long list, instead of a list of lists.
I researched sorts and found the lambda...function, so I started to implement
new_file_data = sorted(new_file_data, key=lambda item: item[0])
but element [0] was just the " at the BOL.
I also noted that I needed to instruct that the date was not in, possibly, the correct format, which led me to this construct:
sorted(new_file_data, key=lambda d: datetime.strptime(d, '%d/%m/%Y'))
I get, loosely, the 'map' construct but not how to combine such that I can just reference element[0] as well as how to reference it (datewise)
and now I'm here, hopefully someone could push me over this hurdle? I think I need to [have] split the list better to start with so each line is an element - I did at one point get a sorted result but all the fields got globbed together, values (sorted) then dates then words etc
So if anyone could offer some advice on my failed list manipulation and how to structure that sort-lambda.
thanks to those who have the time and know how to respond to such starter queries.
Upvotes: 2
Views: 241
Reputation: 4238
If I understand correctly you want to read the contents of the csv and sort them by date.
Given the contents of data.csv
"08/04/2015","Balance","5,804.30","Current Balance for account 123S14"
"08/04/2015","Balance","5,804.30","Available Balance for account 123S14"
"02/03/2015","241.25","Transaction description","2,620.09"
"02/03/2015","-155.49","Transaction description","2,464.60"
"03/03/2015","82.00","Transaction description","2,546.60"
"03/03/2015","243.25","Transaction description","2,789.85"
"03/03/2015","-334.81","Transaction description","2,339.12"
"04/03/2015","-25.05","Transaction description","2,314.07"
I would use the csv
-module to read the data.
import csv
with open('data.csv') as f:
data = [row for row in csv.reader(f)]
Which gives:
>>> data
[['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14'],
['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
['04/03/2015', '-25.05', 'Transaction description', '2,314.07']]
Then you can use the datetime
-module to provide a key for sorting.
import datetime
sorted_data = sorted(data, key=lambda row: datetime.datetime.strptime(row[0], "%d/%m/%Y"))
Which gives:
>>> sorted_data
[['02/03/2015', '241.25', 'Transaction description', '2,620.09'],
['02/03/2015', '-155.49', 'Transaction description', '2,464.60'],
['03/03/2015', '82.00', 'Transaction description', '2,546.60'],
['03/03/2015', '243.25', 'Transaction description', '2,789.85'],
['03/03/2015', '-334.81', 'Transaction description', '2,339.12'],
['04/03/2015', '-25.05', 'Transaction description', '2,314.07'],
['08/04/2015', 'Balance', '5,804.30', 'Current Balance for account 123S14'],
['08/04/2015', 'Balance', '5,804.30', 'Available Balance for account 123S14']]
Upvotes: 2
Reputation: 534
You can define your own sorting function.
Do a mix of these two questions and you'll have what you want (or something close):
Python date string to date object
In your sorting function, tranform the date from string to datetime and compare
def cmp_items(a, b):
datetime_a = datetime.datetime.strptime(a.[0], "%d/%m/%Y").date()
datetime_b = datetime.datetime.strptime(a.[0], "%d/%m/%Y").date()
if datetime_a > datetime_b:
return 1
elif datetime_a == datetime_b:
return 0
else:
return -1
and then, you just have to sort the list using it
new_file_data = new_file_data.sort(cmp_items)
You can still have a little problem after that, the elements with the same date will be in an random-like order. You can improve the comparing function to compare more stuff to prevent that.
BTW, you have not stripped the burried commas out, it seems you have completely removed the last part.
Upvotes: 1