jozofe
jozofe

Reputation: 79

How can I extract average temperature per day from a list of measurements?

I have a .txt file like this:

#day hr T 0.1 d.C.
1    1  137
1    2  124
1    3  130
1    4  128
1    5  141
1    6  127
1    7  153
1    8  137
1    9  158
1    10 166
...
2   1   136
2   2   135
2   3   135
2   4   132
and so on...

I wrote this code:

import sys

NUMBEROFDAYS = []
NUMBEROFHOURS = []
Temp = []

for line in sys.stdin:
    x = (line[0:2])
    NUMBEROFDAYS.append(x)

What I get is:

['#d', '1\t', '1\t', '1\t', '1\t', '1\t', '1\t', '1\t', '1\t', '1\t',   and it goes on...

However I need to extract relevant integers from the text. How do I do that?

My final goal is to compute the average temperature for each day (the temperature is represented in the 3rd column).

Upvotes: 1

Views: 548

Answers (2)

Jacob Vlijm
Jacob Vlijm

Reputation: 3159

Since you need to group the data by day (first column), this seems a typical case for itertools'groupby():

from itertools import groupby

# first check if all characters in the line are integers:
valid = [l for l in open("/path/to/file.txt").readlines() if "".join(l.split()).isdigit()]
# split valid lines into numbers
data = [[int(n) for n in line.split()] for line in valid]
# group data by day (first number of the line)
day_data = [[item, list(records)] for item, records in groupby(data, key = lambda r: r[0])]
for day in day_data:
    temps = day[1]
    print(day[0], sum([r[2] for r in temps])/float(len(temps)))

With your lines, this will output:

1 140.1
2 134.5

Explanation

  • First we read the textfile as a list of lines:

    open("/path/to/file.txt").readlines()
    
  • we check if all characters are integers, after removing all whitespaces:

     if "".join(l.split()).isdigit()
    
  • Then we split each of the valid lines into a list of three integers:

    data = [[int(n) for n in line.split()] for line in valid]
    
  • then we use groupby to group the data by day (which is the first integer of each line):

    day_data = [[item, list(records)] for item, records in groupby(data, key = lambda r: r[0])]
    

    This will deliver us two records, one for each day:

    1, [[1, 1, 137], [1, 2, 124], [1, 3, 130], [1, 4, 128], [1, 5, 141], [1, 6, 127], [1, 7, 153], [1, 8, 137], [1, 9, 158], [1, 10, 166]
    

    and:

    2, [[2, 1, 136], [2, 2, 135], [2, 3, 135], [2, 4, 132]
    
  • Subsequently, we print the day, followed by the average of the third column for that specific day:

    for day in day_data:
        temps = day[1]
        print(day[0], sum([r[2] for r in temps])/float(len(temps)))
    

Upvotes: 1

Jean-François Fabre
Jean-François Fabre

Reputation: 140168

You're mixing up fields and characters. You have to split your string and convert splitted strings to integers.

Then, you have to create one list per day, so it's better to use a dictionary to create several temp vectors, and print the mean of each day in the end.

(note that the 2nd column is completely unused)

import sys

from collections import defaultdict

d = defaultdict(lambda : list()) # dictionary: key=day, values=temp list

sys.stdin.readline() # get rid of the title
for line in sys.stdin:
    # for each line, split it (to remove blanks, so byebye tabs and convert to integer, create a list with that: list comprehension)
    x = [int(x) for x in line.split()]
    d[x[0]].append(x[2])   # add temperature to each day

for day,temps in sorted(d.items()):
    print("day {}, average temp {}".format(day,float(sum(temps))/len(temps)))

result:

day 1, average temp 140.1
day 2, average temp 134.5

Upvotes: 0

Related Questions