Reputation: 79
I have a .txt
file like this:
#day hr T 0.1 d.C.
1 1 137
1 2 124
1 3 130
1 4 128
1 5 141
1 6 127
1 7 153
1 8 137
1 9 158
1 10 166
...
2 1 136
2 2 135
2 3 135
2 4 132
and so on...
I wrote this code:
import sys
NUMBEROFDAYS = []
NUMBEROFHOURS = []
Temp = []
for line in sys.stdin:
x = (line[0:2])
NUMBEROFDAYS.append(x)
What I get is:
['#d', '1\t', '1\t', '1\t', '1\t', '1\t', '1\t', '1\t', '1\t', '1\t', and it goes on...
However I need to extract relevant integers from the text. How do I do that?
My final goal is to compute the average temperature for each day (the temperature is represented in the 3rd column).
Upvotes: 1
Views: 548
Reputation: 3159
Since you need to group the data by day (first column), this seems a typical case for itertools'groupby()
:
from itertools import groupby
# first check if all characters in the line are integers:
valid = [l for l in open("/path/to/file.txt").readlines() if "".join(l.split()).isdigit()]
# split valid lines into numbers
data = [[int(n) for n in line.split()] for line in valid]
# group data by day (first number of the line)
day_data = [[item, list(records)] for item, records in groupby(data, key = lambda r: r[0])]
for day in day_data:
temps = day[1]
print(day[0], sum([r[2] for r in temps])/float(len(temps)))
With your lines, this will output:
1 140.1
2 134.5
First we read the textfile as a list of lines:
open("/path/to/file.txt").readlines()
we check if all characters are integers, after removing all whitespaces:
if "".join(l.split()).isdigit()
Then we split each of the valid lines into a list of three integers:
data = [[int(n) for n in line.split()] for line in valid]
then we use groupby
to group the data by day (which is the first integer of each line):
day_data = [[item, list(records)] for item, records in groupby(data, key = lambda r: r[0])]
This will deliver us two records, one for each day:
1, [[1, 1, 137], [1, 2, 124], [1, 3, 130], [1, 4, 128], [1, 5, 141], [1, 6, 127], [1, 7, 153], [1, 8, 137], [1, 9, 158], [1, 10, 166]
and:
2, [[2, 1, 136], [2, 2, 135], [2, 3, 135], [2, 4, 132]
Subsequently, we print the day, followed by the average of the third column for that specific day:
for day in day_data:
temps = day[1]
print(day[0], sum([r[2] for r in temps])/float(len(temps)))
Upvotes: 1
Reputation: 140168
You're mixing up fields and characters. You have to split your string and convert splitted strings to integers.
Then, you have to create one list per day, so it's better to use a dictionary to create several temp vectors, and print the mean of each day in the end.
(note that the 2nd column is completely unused)
import sys
from collections import defaultdict
d = defaultdict(lambda : list()) # dictionary: key=day, values=temp list
sys.stdin.readline() # get rid of the title
for line in sys.stdin:
# for each line, split it (to remove blanks, so byebye tabs and convert to integer, create a list with that: list comprehension)
x = [int(x) for x in line.split()]
d[x[0]].append(x[2]) # add temperature to each day
for day,temps in sorted(d.items()):
print("day {}, average temp {}".format(day,float(sum(temps))/len(temps)))
result:
day 1, average temp 140.1
day 2, average temp 134.5
Upvotes: 0