Reputation: 51

How to get basic statistics such as sum, mean, median, max and min from list created from csv file

I am VERY NEW to programming, and Freshly new to python. I've read in a csv file of two columns and I am looking to find the mean, median, max and min of the second column. I managed to get max and min, but I run into trouble with sum. I'm sure the issue is mostly "syntactic" (is that a word).

with open(pybankfile, newline="") as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=",")

    # @NOTE: This time, we do not use `next(csv_reader)` because there is no header for this file

    # Read the header row first (skip this step if there is now header)
    csv_header = next(csvfile)
    lst =[]
    print(f"CSV Header: {csv_header}")
    reader = csv.reader(csvfile)
    data = list(reader)
    maxnum = max(data, key=lambda row: int(row[1]))
    minnum = min(data, key=lambda row: int(row[1]))
    tot = sum(data)  *#fails here
    # print(f"maximum: {maxnum}")
    print(data)
    sum()
    print(f"minimum: {minnum}")
    print(f"maximum: {maxnum}")
    print(f"Balance: {tot}")

Upvotes: 2

Answers (4)

AntonijMKD

Reputation: 11

import csv
#add your file here
with open('examplefile.csv', newline='') as f:
    reader = csv.reader(f)
    data = list(reader)

#list comprehension to extract the second column
second_column = [int(row[1]) for row in data]

#Built in functions in python 
maxnum = max(second_column)
minnum = min(second_column)
sumcol = sum(second_column)

#The mean is the sum divided by the length
meancol = sumcol/int(len(second_column))

#Median

#We create a new sorted list, while keeping the original unchanged
sorted_second_column = sorted(second_column)

col_length = len(second_column)
index = (col_length - 1) // 2
if col_length % 2 == 0:
    mediancol = ((second_column[index] + second_column[index + 1])/2.0)
else:
    mediancol = second_column[index]

print(maxnum, minnum, sumcol, meancol, mediancol)

Upvotes: 1

Gokulaselvan

Reputation: 138

As you are new to the python, try learning libraries like numpy and pandas.. There you will be covered by maximum of things you want to do..

    import pandas as pd

    data = pd.read_csv("csv_file.csv")

    data.describe()

you will get an output something like this

count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
dtype: float64

You can go and read from this link pandas .describe() documentation

Upvotes: 0

Damzaky

Reputation: 10824

I suggest you to use pandas package because it's easier for beginners and used even by professionals. You can achieve the same thing as follows:

import pandas as pd 

df = pd.read_csv("filename.csv")

df.describe()

The describe method will show you all the statistics

if you just want to select specific column, do this:

df[df.columns[1]].describe() #second column

if you know the column name:

df['column name'].describe()

if you only need to know, for instance the mean of the second column, do this:

df[df.columns[1]].mean()

Upvotes: 0

smrf

Reputation: 361

I create a CSV file by the below content,

col1, col2
1,2
1,2
3,5
8,5

You can solve the problem by the following code,

import csv

csvfile = 'data.csv'
data = dict()
with open('data.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            data[0] = []
            data[1] = []
            line_count += 1
        else:
            data[0].append(int(row[0]))
            data[1].append(int(row[1]))
            line_count += 1
        print(f'Processed {line_count} lines.')

maxnum = max(data[1])
minnum = min(data[1])
tot = sum(data[1])
print(data)
print(f"minimum: {minnum}")
print(f"maximum: {maxnum}")
print(f"Balance: {tot}")

however, I recommend using Pandas for reading and working with CSV files. you can find a complete tutorial for Pandas in the following link: pandas tutorial

Also for mean, average, median, I recommend using NumPy, you can find a tutorial for NumPy in the following link, numpy tutorial

Upvotes: 0

How to get basic statistics such as sum, mean, median, max and min from list created from csv file

Answers (4)

Related Questions