Vineet
Vineet

Reputation: 624

python validate csv file data against model data

I get csv files containing share price data. A sample is as below.

'dttm','open','high','low','close'
"2014/01/01 09:16:00",6365.2001953125,6369.89990234375,6355,6355,0
"2014/01/01 09:17:00",6355.64990234375,6359.9501953125,6355.5498046875,6359.5498046875,0
"2014/01/01 09:18:00",6359.5,6359.7998046875,6358,6359,0
"2014/01/01 09:19:00",6358.9501953125,6359.4501953125,6357.5498046875,6359,0
"2014/01/01 09:20:00",6359,6359,6355.64990234375,6356.5,0
.....likewise till "2014/01/01 15:30:30"  (and for further dates ahead)

Every row contains data for a minute.

Problem:-
Sometimes, a minute data is skipped. e.g. row for "2014/01/01 09:18:00" would not be present.
This hampers my program logic ahead.

What I require:-
is to validate the csv file whether rows for every 1 minute are present between 09:15:15 to 15:30:30 for every date. If not, copy previous row and insert for that minute (which is not present).

Can anybody pl. help?
Thanks.

Upvotes: 0

Views: 974

Answers (2)

donnyyy
donnyyy

Reputation: 452

Here is sample code you can use:

from dateutil.parser import parse
from datetime import datetime, timedelta


data = [
    ("2014/01/01 09:16:00",6365.2001953125,6369.89990234375,6355,6355,0),
    ("2014/01/01 09:17:00",6355.64990234375,6359.9501953125,6355.5498046875,6359.5498046875,0),
    ("2014/01/01 09:20:00",6359,6359,6355.64990234375,6356.5,0),
]


def insert_into_db(date, open, high, low, close, zero):
    print('inserting {} {} {} {} {} {}'.format(date, open, high, low, close, zero))

prev_date = None
for date, open, high, low, close, zero in data:
    date = parse(date)

    if prev_date is not None and date - prev_date > timedelta(minutes=1):
        for i in reversed(range((date - prev_date).seconds // 60 - 1)):
            date_between = date - timedelta(minutes=1 * i + 1)
            insert_into_db(date_between, open, high, low, close, zero)

    insert_into_db(date, open, high, low, close, zero)
    prev_date = date

The output is:

inserting 2014-01-01 09:16:00 6365.2001953125 6369.89990234375 6355 6355 0
inserting 2014-01-01 09:17:00 6355.64990234375 6359.9501953125 6355.5498046875 6359.5498046875 0
inserting 2014-01-01 09:18:00 6358.9501953125 6359.4501953125 6357.5498046875 6359 0
inserting 2014-01-01 09:19:00 6358.9501953125 6359.4501953125 6357.5498046875 6359 0
inserting 2014-01-01 09:20:00 6359 6359 6355.64990234375 6356.5 0

But you should make sure start and end minutes present (or adapt the code).

Update: fixed case if multiple minutes missing

Upvotes: 1

SamAtWork
SamAtWork

Reputation: 465

you could basically read two consecutive lines and get time delta. If it is not 1 minute, means you're missing a line. Just write to the csv with a new line character at the end! you could also write everything to a new CSV file.

import csv
import datetime
f = open("your_file.csv", "w+")
ff = csv.reader(f)    
pre_line = ff.next()
while(True):
    try:
        cur_line = ff.next()
        if cur_line - pre_line != # 1 minute difference:
            f.write(pre_line)
            f.write('/n')
    except:
        break

Upvotes: 1

Related Questions