Reputation: 51
I am VERY NEW to programming, and Freshly new to python. I've read in a csv file of two columns and I am looking to find the mean, median, max and min of the second column. I managed to get max and min, but I run into trouble with sum. I'm sure the issue is mostly "syntactic" (is that a word).
with open(pybankfile, newline="") as csvfile:
csv_reader = csv.reader(csvfile, delimiter=",")
# @NOTE: This time, we do not use `next(csv_reader)` because there is no header for this file
# Read the header row first (skip this step if there is now header)
csv_header = next(csvfile)
lst =[]
print(f"CSV Header: {csv_header}")
reader = csv.reader(csvfile)
data = list(reader)
maxnum = max(data, key=lambda row: int(row[1]))
minnum = min(data, key=lambda row: int(row[1]))
tot = sum(data) *#fails here
# print(f"maximum: {maxnum}")
print(data)
sum()
print(f"minimum: {minnum}")
print(f"maximum: {maxnum}")
print(f"Balance: {tot}")
Upvotes: 2
Views: 2357
Reputation: 11
import csv
#add your file here
with open('examplefile.csv', newline='') as f:
reader = csv.reader(f)
data = list(reader)
#list comprehension to extract the second column
second_column = [int(row[1]) for row in data]
#Built in functions in python
maxnum = max(second_column)
minnum = min(second_column)
sumcol = sum(second_column)
#The mean is the sum divided by the length
meancol = sumcol/int(len(second_column))
#Median
#We create a new sorted list, while keeping the original unchanged
sorted_second_column = sorted(second_column)
col_length = len(second_column)
index = (col_length - 1) // 2
if col_length % 2 == 0:
mediancol = ((second_column[index] + second_column[index + 1])/2.0)
else:
mediancol = second_column[index]
print(maxnum, minnum, sumcol, meancol, mediancol)
Upvotes: 1
Reputation: 138
As you are new to the python, try learning libraries like numpy and pandas.. There you will be covered by maximum of things you want to do..
import pandas as pd
data = pd.read_csv("csv_file.csv")
data.describe()
you will get an output something like this
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
dtype: float64
You can go and read from this link pandas .describe() documentation
Upvotes: 0
Reputation: 10824
I suggest you to use pandas package because it's easier for beginners and used even by professionals. You can achieve the same thing as follows:
import pandas as pd
df = pd.read_csv("filename.csv")
df.describe()
The describe method will show you all the statistics
if you just want to select specific column, do this:
df[df.columns[1]].describe() #second column
if you know the column name:
df['column name'].describe()
if you only need to know, for instance the mean
of the second column, do this:
df[df.columns[1]].mean()
Upvotes: 0
Reputation: 361
I create a CSV file by the below content,
col1, col2
1,2
1,2
3,5
8,5
You can solve the problem by the following code,
import csv
csvfile = 'data.csv'
data = dict()
with open('data.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
data[0] = []
data[1] = []
line_count += 1
else:
data[0].append(int(row[0]))
data[1].append(int(row[1]))
line_count += 1
print(f'Processed {line_count} lines.')
maxnum = max(data[1])
minnum = min(data[1])
tot = sum(data[1])
print(data)
print(f"minimum: {minnum}")
print(f"maximum: {maxnum}")
print(f"Balance: {tot}")
however, I recommend using Pandas for reading and working with CSV files. you can find a complete tutorial for Pandas in the following link: pandas tutorial
Also for mean, average, median, I recommend using NumPy, you can find a tutorial for NumPy in the following link, numpy tutorial
Upvotes: 0