KF18290
KF18290

Reputation: 7

Get readings from duplicate names in CSV file Python

I am fairly new at Python and am having some issues reading in my csv file. There are sensor names, datestamps and readings in each column. However, there are multiple of the same sensor name, which I have already made a list of the different options called OPTIONS, shown below

OPTIONS = []

with open('sensor_data.csv', 'rb') as f:

    reader = csv.reader(f, delimiter = ',')

    for row in reader:
        if row[0] not in OPTIONS:
            OPTIONS.append(row[0])
        sensor_name = row[0]
        datastamp = row[1]
        readings = float(row[2]) 
        print(OPTIONS)

Options prints fine,

But now I am having issues retrieving any readings, and using them to calculate average and maximum readings for each unique sensor name.

here are a few lines of sensor_data.csv, which goes from 2018-01-01 to 2018-12-31 for sensor_1 to sensor_25.

Any help would be appreciated.

Upvotes: 0

Views: 92

Answers (2)

balderman
balderman

Reputation: 23815

A shorter version below

import csv
from collections import defaultdict

readings = defaultdict(list)

with open('sensor_data.csv', 'r') as f:
    reader = csv.reader(f, delimiter = ',')
    for row in reader:
        readings[row[0]].append(float(row[2]) )

for sensor_name,values in readings.items():
    print('Sensor: {}, Max readings: {}, Avg: {}'.format(sensor_name,max(values), sum(values)/ len(values)))

Upvotes: 0

VietHTran
VietHTran

Reputation: 2318

What you have for the readings variable is just the reading of each row. One way to get the average readings is to keep track of the sum and count of readings (sum_readings and count_readings respectively) and then after the for loop you can get the average by dividing the sum with the count. You can get the maximum by initializing a max_readings variable with a reading minimum value (I assume to be 0) and then update the variable whenever the current reading is larger than max_readings (max_readings < readings)

import csv

OPTIONS = []
OPTIONS_READINGS = {}

with open('sensor_data.csv', 'rb') as f:
    reader = csv.reader(f, delimiter = ',')
    for row in reader:
        if row[0] not in OPTIONS:
            OPTIONS.append(row[0])
            OPTIONS_READINGS[row[0]] = []
        sensor_name = row[0]
        datastamp = row[1]
        readings = float(row[2]) 
        print(OPTIONS)
        OPTIONS_READINGS[row[0]].append(readings)

for option in OPTIONS_READINGS:
    print(option)
    readings = OPTIONS_READINGS[option]
    print('Max readings:', max(readings))
    print('Average readings:', sum(readings) / len(readings))

Edit: Sorry I misread the question. If you want to get the maximum and average of each unique options, there is a more straight forward way which is to use an additional dictionary-type variable OPTIONS_READINGS whose keys are the option names and the values are the list of readings. You can find the maximum and average reading of an options by simply using the expression max(OPTIONS_READINGS[option]) and sum(OPTIONS_READINGS[option]) / len(OPTIONS_READINGS[option]) respectively.

Upvotes: 1

Related Questions