Reputation: 11
I have a csv with the following sample data:
id bb_id cc_id datetime
-------------------------
1 11 44 2019-06-09
2 33 55 2020-06-09
3 22 66 2020-06-09
4 11 44 2019-06-09
5 11 44 2020-02-22
Let's say the condition is if bb_id == 11 and cc_id == 44
get the latest record, which is:
11 44 2020-02-22
How do I get this from the csv?
What I did:
with open('sample.csv') as csv_file
for indx, data in enumerate(csv.DictReader(csv_file)):
# check if the conditional data is in the file?
if data['bb_id'] == 11 and data['cc_id'] == 44:
# sort the data by date? or should I store all the relevant data before hand in a data structure like list and then apply sort on it? could I avoid that? as I need to perform this interactively multiple times
Upvotes: 0
Views: 293
Reputation: 1386
The simplest way I know:
import pandas as pd
import pandasql as ps
sample_df = pd.read_csv(<filepath>);
ps.sqldf("""select *
from (select *
from sample_df
where bb_id = 11
and cc_id = 44
order by datetime desc) limit 1""", locals())
Upvotes: 0
Reputation: 11360
If you really want to do this in regular python, something like this is simple:
with open('sample.csv') as csv_file:
list_of_dates = []
for indx, data in enumerate(csv.DictReader(csv_file)):
if data['bb_id'] == 11 and data['cc_id'] == 44:
list_of_dates.append(data['datetime'])
sorted = list_of_dates.sort()
print( sorted[-1] ) # you already know the values for bb and cc
Also try:
def sort_func(e):
return e['datetime']
with open('sample.csv') as csv_file:
list_of_dates = []
for indx, data in enumerate(csv.DictReader(csv_file)):
if data['bb_id'] == 11 and data['cc_id'] == 44:
list_of_dates.append(data)
sorted = list_of_dates.sort(key=sort_func)
print( sorted[-1] )
Upvotes: 0
Reputation: 781068
Put all the selected records in a list, then use the max()
function with the date as the key.
selected_rows = []
with open('sample.csv') as csv_file
for data in csv.DictReader(csv_file):
# check if the conditional data is in the file?
if data['bb_id'] == 11 and data['cc_id'] == 44:
selected_rows.append(data)
latest = max(selected_rows, key = lambda x: x['datetime'])
print(latest)
Upvotes: 1