Reputation: 743
I am working on reading from my csv file using python. But I want to read only specific(last valid) rows from the tail in csv also there is a catch that function should return the entire row only when it is valid. Can anyone help me out with this?
Below is my csv file looks like:
Sr. Add A B C D
0 0013A20040D6A141 -308.1 -307.6 -307.7 -154.063
1 0013A20040DC889A -308.7 -311.7 -311.7 -154.263
2 0013A20040DC88C3 -310.1 -310.1 -310.2 -154.863
3 0013A20040D6A141 -308.2 -306.8 -307.7 -153.863
4 0013A20040DC889A -308.7 -311.4 -311.1 -153.263
5 0013A20040DC88C3 -- -- -- --
6 0013A20040D6A141 -308.7 -308.3 -305.2 -154.663
and the code I am trying is:
def last_data(address):
i = sum(1 for line in open("filename.csv", 'r'))
print i # number of lines in csv
cache = {} # dict that saved the last data for particular address
n = 3
with open("filename.csv",'r') as f:
q = deque(f, 3) # 3 lines read at the end
qp = [''] * n
if i +1 >= n: # for checking whether the number of lines greater than number of add.
for k in range(n):
qp[k] = q[k].split(',')
if address == str(qp[k][1]): # check for particular address in row
# if the row has data than put it into json object with address as key and nested key as columns 'A', 'B', etc.
cache.update({address: {'A':struct.pack('>l',int(float(qp[k][3]) * 10)),
'C':struct.pack('>l',int(float(qp[k][4]) * 10))
}})
return cache[address]['A'], cache[address]['C']
For last_data('0013A20040DC88C3')
return 5th row with invalid data, where I want to show 2nd row. Can any body tell me how to do this?
Upvotes: 0
Views: 190
Reputation: 10899
With pandas it would look like this:
Note: python 2.7. code. Change the import for the StringIo on Python3.
import pandas as pd
from StringIO import StringIO
input = """Sr. Add A B C D
0 0013A20040D6A141 -308.1 -307.6 -307.7 -154.063
1 0013A20040DC889A -308.7 -311.7 -311.7 -154.263
2 0013A20040DC88C3 -310.1 -310.1 -310.2 -154.863
3 0013A20040D6A141 -308.2 -306.8 -307.7 -153.863
4 0013A20040DC889A -308.7 -311.4 -311.1 -153.263
5 0013A20040DC88C3 -- -- -- --
6 0013A20040D6A141 -308.7 -308.3 -305.2 -154.663
"""
buffer = StringIO(input)
df = pandas.read_csv(buffer, delim_whitespace=True, na_values=["--"])
# you can customize the behaviour here, e.g. how many invalid values are ok per row.
# see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
df = df.dropna()
Upvotes: 1