Ming Chow
Ming Chow

Reputation: 11

Python csv: find the latest record with a condition

I have a csv with the following sample data:

id bb_id cc_id datetime
-------------------------
1  11    44    2019-06-09
2  33    55    2020-06-09
3  22    66    2020-06-09
4  11    44    2019-06-09
5  11    44    2020-02-22

Let's say the condition is if bb_id == 11 and cc_id == 44 get the latest record, which is:

11    44    2020-02-22

How do I get this from the csv?

What I did:

 with open('sample.csv') as csv_file
     for indx, data in enumerate(csv.DictReader(csv_file)):
         # check if the conditional data is in the file?
         if data['bb_id'] == 11 and data['cc_id'] == 44:
                     # sort the data by date? or should I store all the relevant data before hand in a data structure like list and then apply sort on it? could I avoid that? as I need to perform this interactively multiple times

Upvotes: 0

Views: 293

Answers (3)

Lars Skaug
Lars Skaug

Reputation: 1386

The simplest way I know:

import pandas as pd
import pandasql as ps

sample_df = pd.read_csv(<filepath>);

ps.sqldf("""select *
            from (select * 
            from sample_df
            where bb_id = 11 
             and cc_id = 44
             order by datetime desc) limit 1""", locals())

Upvotes: 0

GAEfan
GAEfan

Reputation: 11360

If you really want to do this in regular python, something like this is simple:

with open('sample.csv') as csv_file:
    list_of_dates = []
    for indx, data in enumerate(csv.DictReader(csv_file)):
         if data['bb_id'] == 11 and data['cc_id'] == 44:
             list_of_dates.append(data['datetime'])

   sorted = list_of_dates.sort()
   print( sorted[-1] ) # you already know the values for bb and cc

Also try:

def sort_func(e):
    return e['datetime']

with open('sample.csv') as csv_file:
    list_of_dates = []
    for indx, data in enumerate(csv.DictReader(csv_file)):
         if data['bb_id'] == 11 and data['cc_id'] == 44:
             list_of_dates.append(data)

    sorted = list_of_dates.sort(key=sort_func)
    print( sorted[-1] )

Upvotes: 0

Barmar
Barmar

Reputation: 781068

Put all the selected records in a list, then use the max() function with the date as the key.

selected_rows = []
with open('sample.csv') as csv_file
    for data in csv.DictReader(csv_file):
        # check if the conditional data is in the file?
        if data['bb_id'] == 11 and data['cc_id'] == 44:
            selected_rows.append(data)
latest = max(selected_rows, key = lambda x: x['datetime'])
print(latest)

Upvotes: 1

Related Questions