user2688158
user2688158

Reputation: 417

Read a csv file into a list and finding max of each column as well as subtracting each value from its corresponding column max value

The biggest restriction is I cannot import any package other than os. So I have a CSV file(attached a picture of my sample data here enter image description here) which I read into a list using the below lines of code:

with open(input_filename,"r") as infile:
   next(infile)    
   for line in infile:
      line = line.strip()
      my_list.append(line.split(',')) 

So my_list looks like this:

my_list = [['A', '0.05', '5.62','-1.65', '0.58'], ['B', '0.03', '6.18','-8.56', '5.26'], ['C', '3.26', '5.78','-2.67', '10.25'], ['D', '0.36', '1.25','-1.78', '']]

I need to ignore first 2 columns and find the max of each column. Also I need to subtract Max_value for each value in a row with corresponding max from its column.

So fat I have managed to read the list column wise using the code below:

 for k in range(2,len(my_list[0][:])):        
        for j in my_list:      
            col_list.append(j[k])

The problem is I get a single list with all column-wise elements. I'm not sure how can I separate them as values from a column, compute the max and then take each element and subtract it from the max value.

The col_list looks like this: ['5.62','6.18','5.78','1.25','-1.65','-8.56','-2.67','-1.78','0.58','5.26','10.25','']

Can someone guide me in the right direction?

Upvotes: 1

Views: 96

Answers (2)

Hiram Foster
Hiram Foster

Reputation: 128

Use pandas

import pandas as pd

df = pd.read_csv(filename)

for col in df.columns:
    df[col] = df[col] - df[col].max

if you can't use pandas, something like this would work. Note that you have an empty cell that I interpreted as 0.

for i in my_list:
    tmp = [float(j) if j != "" else 0 for j in i[2:]]
    i[2:] = [ z - max(tmp) for z in tmp ]

Upvotes: 1

Matthew Gaiser
Matthew Gaiser

Reputation: 4803

Something like this should do the job, no libraries used at all. It is not exactly elegant, but it seems to get the job done with your test cases.

with open('test.csv', 'r') as f:
    max_values = [-999,-999,-999]
    for line in f:
            data = line.split(',')
            if (max_values[0] < float(data[2].strip().replace("'",""))):
                max_values[0] = float(data[2].strip().replace("'",""))

            if (max_values[1] < float(data[3].strip().replace("'",""))):
                max_values[1] = float(data[3].strip().replace("'",""))

            if (max_values[2] < float(data[4].strip().replace("'",""))):
                max_values[2] = float(data[4].strip().replace("'",""))

print(max_values)

Presumably you can handle the extra part about subtracting max value? I am not totally sure what you mean by that.

Upvotes: 1

Related Questions