Reputation: 575
My Data looks like this:
549 648.077 0.01
552 648.141 0.45
554 647.167 0.1
572 648.141 0.3
530 630.213 0.69
560 670.312 0.70
there are a few thousand lines in the file
the 1st
row values range from 0-1100
the 2nd
row values range from 600-700
the 3rd
row values range from 0-1
I need to plot the data and therefore need to sort and modify the data:
I need to split the 3rd row values (normal range 0.0-1.0
) into segments 0.0-0.20
, 0.21-0.40
, 0,41-0,60
, 0.61-0.80
,0.81-1.00
Next I need to split the segments from the 1st
row (normal range (0-1100
) into segments like 0-10
, 11-20
,21-30
and so on up to 1100.
What I want to do is find all 2nd
row values within a region 0.0-0.20
and 0-10
, 0.0-0.20
and 11-20
,0.0-0.20
and 21-30
.
When found I want to add them all together and divide the value by the number of appearances to get the mean value: so I want for a region between 0.0-0.20
and 0-10
one value
. I'm fairly new to python and I think that this is some kind of approach:
import os
import csv
dataList = []
with open("table.csv") as csv_file:
data_reader = csv.reader(csv_file, dialect='excel-tab')
for rows in data_reader:
if float(rows[2]) <= 0.20:
if float(rows[0]) <= 10:
print(rows)
if 10 <float(rows[0])<=20:
print (rows)
That should work (without the print of course) to get the values, repeated than for if 20<float(rows[2])<= 30:
..... That should bring me the values I want but is there an easy way to set a range going from 0-1100 in 10 units step?
P.S.: I am aware that I gave lots of Info for a relative short question and that's because I don't really know if python is the best way to do this and if my approach is reasonable? Maybe I should go with panda
but that I just installed. So in case anyone knows an easier (maybe not coding related) way to solve a problem like this I'd really appreciate it.
Upvotes: 1
Views: 79
Reputation: 69041
You need a sorting key to put your data in the correct order; then you can cycle through it to group it and get your sums. itertools.groupby
is excellent for this type of thing:
import os
import csv
from itertools import groupby
data_list = []
with open("table.csv") as csv_file:
data_reader = csv.reader(csv_file, dialect='excel-tab')
for row in data_reader:
# read and convert data
data_list.append((int(row[0]), float(row[1]), float(row[2])))
def classify(row):
# returns classification of row based on third value, then first value
# i.e. [549, 648.077, 0.01] -> (0.0, 540)
# [572, 648.141, 0.3] -> (0.3, 570)
first, second, third = row
for level3 in (0.2, 0.4, 0.6, 0.8):
if third <= level3:
break
else:
level3 = 1.0
for level1 in range(10, 1100, 10):
if first <= level1:
break
else:
level1 = 1100
return level3, level1
data_list.sort(key=classify)
plot_values = []
for key, group in groupby(data_list, classify):
group = list(group)
number_of_elements = len(group)
mean = sum([r[2] for r in group]) / number_of_elements
plot_values.append((key, mean))
print('\n'.join(['%-10s -> %s' % (k, m) for k, m in plot_values]))
When run against the sample data you provided:
(0.2, 550) -> 0.01
(0.2, 560) -> 0.1
(0.4, 580) -> 0.3
(0.6, 560) -> 0.45
(0.8, 530) -> 0.69
(0.8, 560) -> 0.7
Upvotes: 1