Reputation: 49
I am beginning to learn python and am struggling with Syntax.
I have a simple CSV file that looks like this
0.01,10,20,0.35,40,50,60,70,80,90,100
2,22,32,42,52,62,72,82,92,102,112
3,33,43,53,63,5647,83,93,103,113,123
I want to look for the highest and lowest value in all the data in the csv file except in the first value of each row.
So effectively the answer here would be
highestValue=5647
lowestValue=0.35
because the data that is looked at is as follows (it ignored the first value of each row)
10,20,0.35,40,50,60,70,80,90,100
22,32,42,52,62,72,82,92,102,112
33,43,53,63,73,5647,93,103,113,123
I would like my code to work for ANY row length.
I really have to admit I'm struggling but here's what I've tried. I usually program PHP so this is all new to me. I have been working on this simple task for a day and can't fathom it out. I think I'm getting confused with terminology 'lists' for example.
import numpy
test_data_file = open ("Anaconda3JamesData/james_test_3.csv","r")
test_data_list = test_data_file.readlines()
test_data_file.close()
for record in test_data_list:
all_values = record.split(',')
maxvalue = np.max(numpy.asfarray(all_values[1:])
print (maxvalue)
With the test data (the CSV file shown at the very top of this question) I would expect the answer to be
highestValue=5647
lowestValue=0.35
Upvotes: 1
Views: 447
Reputation: 5473
Here's a pandas
solution that can give the desired results:
import pandas as pd
df = pd.read_csv('test1.csv', header=None)
# df:
# 0 1 2 3 4 5 6 7 8 9 10
# 0 0.01 10 20 0.35 40 50 60 70 80 90 100
# 1 2.00 22 32 42.00 52 62 72 82 92 102 112
# 2 3.00 33 43 53.00 63 5647 83 93 103 113 123
df = df.iloc[:, 1:]
print("Highest value: {}".format(df.values.max()))
print("Lowest value: {}".format(df.values.min()))
#Output:
Highest value: 5647.0
Lowest value: 0.35
Upvotes: 0
Reputation:
If you're using numpy, you can read your csv file as a numpy.ndarray
using numpy.genfromtxt()
and then use the array's .max()
and .min()
methods
import numpy
array = numpy.genfromtxt('Anaconda3JamesData/james_test_3.csv', delimiter=',')
array[:, 1:].max()
array[:, 1:].min()
The [:, 1:]
part is using numpy's array indexing. It's saying take all the rows (the first [:,
part), and for each row take all but the first column (the 1:]
part) . This doesn't work with Python's built in lists.
Upvotes: 1
Reputation: 16633
I think using numpy
is unneeded for this task. First of all, this:
test_data_file = open ("Anaconda3JamesData/james_test_3.csv","r")
test_data_list = test_data_file.readlines()
test_data_file.close()
for record in test_data_list:
can be simplified into this:
with open("Anaconda3JamesData/james_test_3.csv","r") as test_data_file:
for record in test_data_file:
We can use a list comprehension to read in all of the values:
with open("Anaconda3JamesData/james_test_3.csv","r") as test_data_file:
values = [float(val) for line in test_data_file for val in line.split(",")[1:]]
values
now contains all relevant numbers, so we can just do:
highest_value = max(values)
lowest_value = min(values)
Upvotes: 0
Reputation: 57135
You do not need the power of numpy for this problem. A simple CSV reader is good enough:
with open("Anaconda3JamesData/james_test_3.csv") as infile:
r = csv.reader(infile)
rows = [list(map(float, line))[1:] for line in r]
max(map(max, rows))
# 5647.0
min(map(min, rows))
# 0.35
Upvotes: 0
Reputation: 782693
You're overwriting maxvalue
each time through the loop, so you're just getting the max value from the last line, not the whole file. You need to compare with the previous maximum.
maxvalue = None
for record in test_data_list:
all_values = record.split(',')
if maxvalue is None:
maxvalue = np.max(numpy.asfarray(all_values[1:])
else:
maxvalue = max(maxvalue, np.max(numpy.asfarray(all_values[1:]))
Upvotes: 0