abhi1610
abhi1610

Reputation: 743

Read last valid rows in csv using python

I am working on reading from my csv file using python. But I want to read only specific(last valid) rows from the tail in csv also there is a catch that function should return the entire row only when it is valid. Can anyone help me out with this?

Below is my csv file looks like:

Sr.       Add             A       B       C         D
0   0013A20040D6A141    -308.1  -307.6  -307.7  -154.063    
1   0013A20040DC889A    -308.7  -311.7  -311.7  -154.263    
2   0013A20040DC88C3    -310.1  -310.1  -310.2  -154.863    
3   0013A20040D6A141    -308.2  -306.8  -307.7  -153.863    
4   0013A20040DC889A    -308.7  -311.4  -311.1  -153.263    
5   0013A20040DC88C3      --      --      --       --   
6   0013A20040D6A141    -308.7  -308.3  -305.2  -154.663    

and the code I am trying is:

def last_data(address):
    i = sum(1 for line in open("filename.csv", 'r'))
    print i # number of lines in csv
    cache = {} # dict that saved the last data for particular address
    n = 3

    with open("filename.csv",'r') as f:
        q = deque(f, 3)  # 3 lines read at the end
        qp = [''] * n
        if i +1  >=  n:  # for checking whether the number of lines greater than number of add.
            for k in range(n):

                qp[k] = q[k].split(',')

                if address == str(qp[k][1]): # check for particular address in row
                  # if the row has data than put it into json object with address as key and nested key as columns 'A', 'B', etc.      
                    cache.update({address: {'A':struct.pack('>l',int(float(qp[k][3]) * 10)),
                                            'C':struct.pack('>l',int(float(qp[k][4]) * 10))
                                            }})

                    return cache[address]['A'], cache[address]['C']

For last_data('0013A20040DC88C3') return 5th row with invalid data, where I want to show 2nd row. Can any body tell me how to do this?

Upvotes: 0

Views: 190

Answers (1)

Christian Sauer
Christian Sauer

Reputation: 10899

With pandas it would look like this:

Note: python 2.7. code. Change the import for the StringIo on Python3.

import pandas as pd
from StringIO import StringIO

input = """Sr.       Add             A       B       C         D
0   0013A20040D6A141    -308.1  -307.6  -307.7  -154.063    
1   0013A20040DC889A    -308.7  -311.7  -311.7  -154.263    
2   0013A20040DC88C3    -310.1  -310.1  -310.2  -154.863    
3   0013A20040D6A141    -308.2  -306.8  -307.7  -153.863    
4   0013A20040DC889A    -308.7  -311.4  -311.1  -153.263    
5   0013A20040DC88C3      --      --      --       --   
6   0013A20040D6A141    -308.7  -308.3  -305.2  -154.663 
"""

buffer = StringIO(input)

df = pandas.read_csv(buffer, delim_whitespace=True, na_values=["--"])

# you can customize the behaviour here, e.g. how many invalid values are ok per row.
# see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

df = df.dropna()

Upvotes: 1

Related Questions