Reputation: 624
I get csv files containing share price data. A sample is as below.
'dttm','open','high','low','close'
"2014/01/01 09:16:00",6365.2001953125,6369.89990234375,6355,6355,0
"2014/01/01 09:17:00",6355.64990234375,6359.9501953125,6355.5498046875,6359.5498046875,0
"2014/01/01 09:18:00",6359.5,6359.7998046875,6358,6359,0
"2014/01/01 09:19:00",6358.9501953125,6359.4501953125,6357.5498046875,6359,0
"2014/01/01 09:20:00",6359,6359,6355.64990234375,6356.5,0
.....likewise till "2014/01/01 15:30:30" (and for further dates ahead)
Every row contains data for a minute.
Problem:-
Sometimes, a minute data is skipped. e.g. row for "2014/01/01 09:18:00" would not be present.
This hampers my program logic ahead.
What I require:-
is to validate the csv file whether rows for every 1 minute are present between 09:15:15 to 15:30:30 for every date. If not, copy previous row and insert for that minute (which is not present).
Can anybody pl. help?
Thanks.
Upvotes: 0
Views: 974
Reputation: 452
Here is sample code you can use:
from dateutil.parser import parse
from datetime import datetime, timedelta
data = [
("2014/01/01 09:16:00",6365.2001953125,6369.89990234375,6355,6355,0),
("2014/01/01 09:17:00",6355.64990234375,6359.9501953125,6355.5498046875,6359.5498046875,0),
("2014/01/01 09:20:00",6359,6359,6355.64990234375,6356.5,0),
]
def insert_into_db(date, open, high, low, close, zero):
print('inserting {} {} {} {} {} {}'.format(date, open, high, low, close, zero))
prev_date = None
for date, open, high, low, close, zero in data:
date = parse(date)
if prev_date is not None and date - prev_date > timedelta(minutes=1):
for i in reversed(range((date - prev_date).seconds // 60 - 1)):
date_between = date - timedelta(minutes=1 * i + 1)
insert_into_db(date_between, open, high, low, close, zero)
insert_into_db(date, open, high, low, close, zero)
prev_date = date
The output is:
inserting 2014-01-01 09:16:00 6365.2001953125 6369.89990234375 6355 6355 0
inserting 2014-01-01 09:17:00 6355.64990234375 6359.9501953125 6355.5498046875 6359.5498046875 0
inserting 2014-01-01 09:18:00 6358.9501953125 6359.4501953125 6357.5498046875 6359 0
inserting 2014-01-01 09:19:00 6358.9501953125 6359.4501953125 6357.5498046875 6359 0
inserting 2014-01-01 09:20:00 6359 6359 6355.64990234375 6356.5 0
But you should make sure start and end minutes present (or adapt the code).
Update: fixed case if multiple minutes missing
Upvotes: 1
Reputation: 465
you could basically read two consecutive lines and get time delta. If it is not 1 minute, means you're missing a line. Just write to the csv with a new line character at the end! you could also write everything to a new CSV file.
import csv
import datetime
f = open("your_file.csv", "w+")
ff = csv.reader(f)
pre_line = ff.next()
while(True):
try:
cur_line = ff.next()
if cur_line - pre_line != # 1 minute difference:
f.write(pre_line)
f.write('/n')
except:
break
Upvotes: 1