Reputation: 365
I'm new to python and pardon me if this question might sound silly -
I have csv file that has 2 columns - Value and Timestamp
. I'm trying to write a code that would take 2 paramenters - start_date
and end_date
and traverse the csv file to obtain all the values between those 2 dates and print the sum of Value
Below is my code. I'm trying to read and store the values in a list.
f_in = open('Users2.csv').readlines()
Value1 = []
Created = []
for i in range(1, len(f_in)):
Value, created_date = f_in[i].split(',')
Value1.append(Value)
Created.append(created_date)
print Value1
print Created
My csv has the following format
10 2010-02-12 23:31:40
20 2010-10-02 23:28:11
40 2011-03-12 23:39:40
10 2013-09-10 23:29:34
420 2013-11-19 23:26:17
122 2014-01-01 23:41:51
When I run my code - File1.py
as below
File1.py 2010-01-01 2011-03-31
The output should be 70
I'm running into the following issues -
Upvotes: 0
Views: 74
Reputation: 3153
Since you said that dates are in timestamp, you can compare them like strings. By realizing that, what you want to achieve (sum the value
s if created
is between start_date
and end_date
) can be done like this:
def sum_values(start_date, end_date):
sum = 0
with open('Users2.csv') as f:
for line in f:
value, created = line.split(' ', 1)
if created > start_date && created < end_date:
sum += int(value)
return sum
str.split(' ', 1)
will split on ' '
but will stop splitting after 1 split has been done. start_date
and end_date
must be in format yyyy-MM-dd hh:mm:ss
which I assume they are, cause they are in timestamp format. Just mind it.
Upvotes: 0
Reputation: 20311
If you are open to using pandas
, try this:
>>> import pandas as pd
>>> data = 'Users2.csv'
>>>
>>> dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
>>> df = pd.read_csv(data, names=['value', 'date'], parse_dates=['date'], date_parser=dateparse)
>>> result = df['value'][(df['date'] > '2010-01-01') &
... (df['date'] < '2011-03-31')
... ].sum()
>>> result
70
Upvotes: 0
Reputation: 697
Depends on your file size, but you may consider putting values from csv
file, into some database, and then query your results.
csv
module has DictReader
which allows you to predefine your column names, it greatly improves readability, specially while working on really big files.
from datetime import datetime
COLUMN_NAMES = ['value', 'timestamp']
def sum_values(start_date, end_date):
sum = 0
with open('Users2.csv', mode='r') as csvfile:
table = csv.DictReader(csvfile, fieldnames=COLUMN_NAMES)
for row in table:
if row['timestamp'] >= min_date and row['timestamp'] <= max_date:
sum += int(row['value'])
return sum
Upvotes: 0
Reputation: 71451
You can try this:
import csv
data = csv.reader(open('filename.csv'))
start_date = 10
end_data = 30
times = [' '.join(i) for i in data if int(i[0]) in range(start_date, end_date)]
Upvotes: 1