Firstname
Firstname

Reputation: 365

Reading and calculation using csv

I'm new to python and pardon me if this question might sound silly -

I have csv file that has 2 columns - Value and Timestamp. I'm trying to write a code that would take 2 paramenters - start_date and end_date and traverse the csv file to obtain all the values between those 2 dates and print the sum of Value

Below is my code. I'm trying to read and store the values in a list.

f_in = open('Users2.csv').readlines()
Value1 = []
Created = []
for i in range(1, len(f_in)):
    Value, created_date = f_in[i].split(',')
    Value1.append(Value)
    Created.append(created_date)

print Value1
print Created

My csv has the following format

10  2010-02-12 23:31:40
20  2010-10-02 23:28:11
40  2011-03-12 23:39:40
10  2013-09-10 23:29:34
420 2013-11-19 23:26:17
122 2014-01-01 23:41:51

When I run my code - File1.py as below

File1.py 2010-01-01 2011-03-31

The output should be 70

I'm running into the following issues -

  1. The data in csv is in timestamp (created_date), but the parameter passed should be date and I need to convert and get the data between those 2 dates regardless of time.
  2. Once I have it in list - as described above - how do I proceed to do my calculation considering the condition in point-1

Upvotes: 0

Views: 74

Answers (4)

campovski
campovski

Reputation: 3153

Since you said that dates are in timestamp, you can compare them like strings. By realizing that, what you want to achieve (sum the values if created is between start_date and end_date) can be done like this:

def sum_values(start_date, end_date):
    sum = 0
    with open('Users2.csv') as f:
        for line in f:
            value, created = line.split(' ', 1)
            if created > start_date && created < end_date:
                sum += int(value)
    return sum

str.split(' ', 1) will split on ' ' but will stop splitting after 1 split has been done. start_date and end_date must be in format yyyy-MM-dd hh:mm:ss which I assume they are, cause they are in timestamp format. Just mind it.

Upvotes: 0

Aziz Alto
Aziz Alto

Reputation: 20311

If you are open to using pandas, try this:

>>> import pandas as pd
>>> data = 'Users2.csv'
>>>
>>> dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
>>> df = pd.read_csv(data, names=['value', 'date'], parse_dates=['date'], date_parser=dateparse)

>>> result = df['value'][(df['date'] > '2010-01-01') &
...                      (df['date'] < '2011-03-31')
...                 ].sum()
>>> result
70

enter image description here

Upvotes: 0

Pythonist
Pythonist

Reputation: 697

Depends on your file size, but you may consider putting values from csv file, into some database, and then query your results.

csv module has DictReader which allows you to predefine your column names, it greatly improves readability, specially while working on really big files.

from datetime import datetime

COLUMN_NAMES = ['value', 'timestamp']


def sum_values(start_date, end_date):
    sum = 0

    with open('Users2.csv', mode='r') as csvfile:
        table = csv.DictReader(csvfile, fieldnames=COLUMN_NAMES)
        for row in table:
            if row['timestamp'] >= min_date and row['timestamp'] <= max_date:
                sum += int(row['value'])
    return sum

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71451

You can try this:

import csv

data = csv.reader(open('filename.csv'))
start_date = 10
end_data = 30

times = [' '.join(i) for i in data if int(i[0]) in range(start_date, end_date)]

Upvotes: 1

Related Questions