Reputation: 105

CSV find max in column and append new data

I asked a question about two hours ago regarding the reading and writing of data from a website. I've spent the last two hours since then trying to find a way to read the maximum date value from column 'A' of the output, comparing that value to the refreshed website data, and appending any new data to the csv file without overriding the old ones or creating duplicates.

The code that is currently 100% working is this:

import requests
symbol = "mtgoxUSD"
url = 'http://api.bitcoincharts.com/v1/trades.csv?symbol={}'.format(symbol)
data = requests.get(url)
with open("trades_{}.csv".format(symbol), "r+") as f:
    f.write(data.text)

I've tried various ways of finding the maximum value of column 'A'. I've tried a bunch of different ways of using "Dict" and other methods of sorting/finding max, and even using pandas and numpy libs. None of which seem to work. Could someone point me in the direction of a decent way to find the maximum of a column from the .csv file? Thanks!

Upvotes: 3

Answers (3)

steveha

Reputation: 76715

I'll give you two answers, one that just returns the max value, and one that returns the row from the CSV that includes the max value.

import csv
import operator as op
import requests

symbol = "mtgoxUSD"
url = 'http://api.bitcoincharts.com/v1/trades.csv?symbol={}'.format(symbol)
csv_file = "trades_{}.csv".format(symbol)

data = requests.get(url)
with open(csv_file, "w") as f:
    f.write(data.text)

with open(csv_file) as f:
    next(f) # discard first row from file -- see notes
    max_value = max(row[0] for row in csv.reader(f))

with open(csv_file) as f:
    next(f) # discard first row from file -- see notes
    max_row = max(csv.reader(f), key=op.itemgetter(0))

Notes:

max() can directly consume an iterator, and csv.reader() gives us an iterator, so we can just pass that in. I'm assuming you might need to throw away a header line so I showed how to do that. If you had multiple header lines to discard, you might want to use islice() from the itertools module.
In the first one, we use a "generator expression" to select a single value from each row, and find the max. This is very similar to a "list comprehension" but it doesn't build a whole list, it just lets us iterate over the resulting values. Then max() consumes the iterable and we get the max value.
max() can use a key= argument where you specify a "key function". It will use the key function to get a value and use that value to figure the max... but the value returned by max() will be the unmodified original value (in this case, a row value from the CSV). In this case, the key function is manufactured for you by operator.itemgetter()... you pass in which column you want, and operator.itemgetter() builds a function for you that gets that column.

The resulting function is the equivalent of:

def get_col_0(row):
    return row[0]
max_row = max(csv.reader(f), key=get_col_0)

Or, people will use lambda for this:

max_row = max(csv.reader(f), key=lambda row: row[0])

But I think operator.itemgetter() is convenient and nice to read. And it's fast.

I showed saving the data in a file, then pulling from the file again. If you want to go through the data without saving it anywhere, you just need to iterate over it by lines.

Perhaps something like:

text = data.text
rows = [line.split(',') for line in text.split("\n") if line]
rows.pop(0)  # get rid of first row from data
max_value = max(row[0] for row in rows)
max_row = max(rows, key=op.itemgetter(0))

I don't know which column you want... column "A" might be column 0 so I used 0 in the above. Replace the column number as you like.

Upvotes: 1

erewok

Reputation: 7835

It seems like something like this should work:

import requests
import csv
symbol = "mtgoxUSD"
url = 'http://api.bitcoincharts.com/v1/trades.csv?symbol={}'.format(symbol)
data = requests.get(url)
with open("trades_{}.csv".format(symbol), "r+") as f:
    all_values = list(csv.reader(f))
    max_value = max([int(row[2]) for row in all_values[1:]])
(write-out-the-value?)

EDITS: I used "row[2]" because that was the sample column I was taking max of in my csv. Also, I had to strip off the column headers, which were all text, which was why I looked at "all_values[1:]" from the second row to the end of the file.

Upvotes: 1

Ryan Saxe

Reputation: 17859

if you have it in a pandas DataFrame, you can get the max of any column like this:

>>> max(data['time'])
'2012-01-18 15:52:26'

where data is the variable name for the DataFrame and time is the name of the column

Upvotes: 2

CSV find max in column and append new data

Answers (3)

Related Questions