Reputation: 93

How to find earliest and latest dates from a CSV File [Python]

My CSV file is arranged so that there's a row named "Dates," and below that row is a gigantic column of a million dates, in the traditional format like "4/22/2015" and "3/27/2014".

How can I write a program that identifies the earliest and latest dates in the CSV file, while maintaining the original format (month/day/year)?

I've tried

for line in count_dates:
    dates = line.strip().split(sep="/")
    all_dates.append(dates)

print (all_dates)

I've tried to take away the "/" and replace it with a blank space, but it does not print anything.

Upvotes: 1

Answers (5)

KlayMen TV

Reputation: 11

I think it is more convenient to use pandas for this purpose.

import pandas as pd

df = pd.read_csv('file_name.csv')
df['name_of_column_with_date'] = pd.to_datetime(df['name_of_column_with_date'], format='%-m/%d/%Y')

print('min_date{}'.format(min(df['name_of_column_with_date'])))
print('max_date{}'.format(max(df['name_of_column_with_date'])))

The built-in functions work well with Pandas Dataframes.

For more understanding of the format feature in pd.to_datatime you can use Python strftime cheat sheet

Upvotes: 0

EvenLisle

Reputation: 4812

Considering you have a large file, reading it in its entirety into memory is a bad idea.

Read the file line by line, manually keeping track of the earliest and latest dates. Use datetime.datetime.strptime to convert the strings to dates (takes the string format as parameter.

import datetime
with open("input.csv") as f:
  f.readline() # get the "Dates" header out of the way
  first = f.readline().strip()
  earliest = datetime.datetime.strptime(first, "%m/%d/%Y")
  latest = datetime.datetime.strptime(first, "%m/%d/%Y")
  for line in f:
    date = datetime.datetime.strptime(line.strip(), "%m/%d/%Y")
    if date < earliest: earliest = date
    if date > latest: latest = date
  print "Earliest date:", earliest
  print "Latest date:", latest

Upvotes: 1

Steve Barnes

Reputation: 28380

A bit of an RTFM answer: Open the file in csv format (see the csv library), and then iterate line by line converting the field that is a date into a date object (see the docs for converting a string to a date object), and if it is less than minimum so far store it as minimum, similar for max, with a special condition on the first line that the date becomes both min and max dates.

Or for some overkill you could just use Pandas to read it into a data frame specifying the specific column as date format then just use max & min.

Upvotes: 0

inspectorG4dget

Reputation: 113965

Let's open the csv file, read out all the dates. Then use strptime to turn them into comparable datetime objects (now, we can use max). Lastly, let's print out the biggest (latest) date

import csv
from datetime import datetime as dt

with open('path/to/file') as infile:
    dt.strftime(max(dt.strptime(row[0], "%m/%d/%Y") \
                    for row in csv.reader(infile)), \
                "%m/%d/%Y")

Naturally, you can use min to get the earliest date. However, this takes two linear runs, and you can do this with just one, if you are willing to do some heavy lifting yourself:

import csv
from datetime import datetime as dt

with open('path/to/file') as infile:
    reader = csv.reader(infile)
    date, *_rest = next(infile)
    date = dt.strptime(date, "%m/%d/%Y")

    for date, *_rest in reader:
        date = dt.strptime(date, "%m/%d/%Y")
        earliest = min(date, earliest)
        latest = max(date, latest)
    print("earliest:", dt.strftime(earliest, "%m/%d/%Y"))
    print("latest:", dt.strftime(latest, "%m/%d/%Y"))

Upvotes: 0

Kathirmani Sukumar

Reputation: 10970

import pandas as pd
import datetime
df = pd.read_csv('file_name.csv')
df['Dates'] = df['Dates'].apply(lambda v: datetime.datetime.strptime(v, '%m/%d/%Y'))
print df['Dates'].min(), df['Dates'].max()

Upvotes: 1

How to find earliest and latest dates from a CSV File [Python]

Answers (5)

Related Questions