Mahsolid
Mahsolid

Reputation: 433

Python - loop through a csv file row values

I have a csv file like the following (test.csv) with two columns.

338,800
338,550
339,670
340,600 
327,500
301,430
299,350
284,339
284,338
283,335
283,330
283,310
282,310
282,300
282,300
283,290

From column 1, I wanted to read current row and compare it with the value of the previous row. If it is greater OR equal, continue comparing and if the value of the current cell is smaller than the previous row - then i wanted to the value of the second column in the same row.

Next I wanted to divided the larger value we got in column 1 by the value in the same cell of column two. Let me make it clear.

For example in the table given above: the smaller value we will get depending on my requirement from Column 1 is 327 (because 327 is smaller than the previous value 340) - and then we take 500 (which is the corresponding cell value on column 2). Finally we divide 340 by 500 and get the value 0.68. My python script should exit right after we print the value to the console.

Currently, I am using the following script in bash, and it works fine

awk -F, '$1<p && $2!=0{ 
val=$2/p    
if(val>=0.8 && val<=0.9)
    {
        print "A"
    }
else if(val==0.7)
    {
        print "B"
    }
else if(val>=0.5 && val <0.7)
    {
        print "C" 

    }
else if(val==0.5)
    {
        print "E"
    }
else
    {
        print "D" 
    }
exit
}
{ 
    p=$1 
}' test.csv

but I wanted to do it with python and i would appreciate for any help. Here is my approach

import csv

f = open("test.csv", "r+")
ff = csv.reader(f)

previous_line = ff.next()
while(True):
    try:
        current_line = ff.next()
        if previous_line <= current_line:
            print "smaller value"
    except StopIteration:
        break

Upvotes: 1

Views: 12332

Answers (3)

BoarGules
BoarGules

Reputation: 16942

I recommend you use csv.Reader's built-in iteration rather than calling .next() directly. And your code should not test normal floats for equality. In any language, that's not just a Python thing. Also, a calculated value of 0.79 will result in D which may not be what you intend.

from __future__ import division
import csv

def category(val):
    if 0.8 < val <= 0.9:
        return "A"
    #Note: don't test val == 0.7: you should never test floats for equality
    if abs(val - 0.7) < 1e-10:
        return "B"
    if 0.5 < val < 0.7:
        return "C"
    if abs(val - 0.5) < 1e-10:
        return "E"
    return "D"

with open(r"E:\...\test.csv", "r") as csvfile:
    ff = csv.reader(csvfile)

    previous_value = 0
    for col1, col2 in ff:
        if not col1.isdigit():
            continue
        value = int(col1)
        if value >= previous_value:
            previous_value = value
            continue
        else:
            result = previous_value / int(col2)
            print category(result)
            break

Edit in response to a change to the OP's request

from __future__ import division
import csv

def category(val):
    if 0.8 < val <= 0.9:
        return "A"
    #Note: don't test val == 0.7: you should never test floats for equality
    if abs(val - 0.7) < 1e-10:
        return "B"
    if 0.5 < val < 0.7:
        return "C"
    if abs(val - 0.5) < 1e-10:
        return "E"
    return "D"

with open(r"E:\...\test.csv", "r") as csvfile:
    ff = csv.reader(csvfile)

    results = []
    previous_value = 0
    for col1, col2 in ff:
        if not col1.isdigit():
            continue
        value = int(col1)
        if value >= previous_value:
            previous_value = value
            continue
        else:
            result = previous_value / int(col2)
            results.append(result)
            print category(result)
            previous_value = value
    print (results)
    print (sum(results))
    print (category(sum(results) / len(results)))

I've had to guess at the logic you want to use for resetting the previous value because your original had the loop break at the first result. And I've no idea how you want end-of-file handled. This revision produces the following output:

C
D
A
A
A
D
[0.68, 0.7604651162790698, 0.86, 0.8820058997050148, 0.8477611940298507, 0.9129032258064517]
4.94313543582
A

As you can see, there are definitely more than two values in results.

Upvotes: 3

Mr.Pacman
Mr.Pacman

Reputation: 360

If it is a .csv file working with pandas could give you more control.

import numpy as np
import pandas as pd

pd.read_csv("filename.csv") # to read a .csv file into a dataframe

However, for this example I am not using pd.read_csv() function. Instead, I am creating a dataframe from a 2D numpy array like so,

dataframe = pd.DataFrame(np.array([[338,800],
    [338,550],
    [339,670],
    [340,600], 
    [327,500],
    [301,430],
    [299,350],
    [284,339],
    [284,338],
    [283,335],
    [283,330],
    [283,310],
    [282,310],
    [282,300],
    [282,300],
    [283,290]]))

Now that I have a dataframe object, I can manipulate it just like other object types in python. I can call pandas specific functions to work on the dataframe for the results I want.

def compare_and_divide(df):
    for i in range(len(df)-1):
        # df[0] for all values in col 0 .iloc[i] for value in row
        if df[0].iloc[i+1] >= df[0].iloc[i]:                                 
            continue     
        else:
            df[0].iloc[i+1] = df[0].iloc[i]

    return df[0].div(df[1]) # .div() function to divide values in col 0 by col 1

compare_and_divide(dataframe)   

0     0.422500
1     0.614545
2     0.505970
3     0.566667
4     0.680000 # 340/500 value mentioned in the question
5     0.790698
6     0.971429
7     1.002950
8     1.005917
9     1.014925
10    1.030303
11    1.096774
12    1.096774
13    1.133333
14    1.133333
15    1.172414
dtype: float64

Upvotes: 0

titiro89
titiro89

Reputation: 2108

col_1 = []
col_2 = []
with open("test.csv", "r+") as f:
    for elem in f.readlines():
        col_1.append(float(elem.split(",")[0]))
        col_2.append(float(elem.split(",")[1]))

condition = True
i=0
while condition:
    if (col_1[i+1]-col_1[i]<0):
        print col_1[i]/col_2[i+1]
        condition = False
    i+=1

Upvotes: 0

Related Questions