Get the earliest date from a column (Python Pandas) after csv.reader

Question

I load some data from a CSV file with multiple columns. In my csv.reader I have an IF function. I am trying to get the earliest date from a specific column (Start Date).

I first load the data:

for row in csv.reader(open('myFile.csv')):
  if row[4] == '56886':
    key = row[4] #key = (row[4], row[33][:4], row[4])
    startDate = row[19]

When I print the column (startDate) I am getting this:

01) I tried using the below:

content = min(content)
print(content)

and I got this in terminal:

02) Then I tried changing my code:

for row in csv.reader(open('myFile.csv',
        parse_dates=['Start Date'], 
        usecols=['Start Date']))
  if row[4] == '56886':
    key = row[4] #key = (row[4], row[33][:4], row[4])
    startDate = row[19]

and I got an invalid syntax error.

03) I tried changing the line to:

pandas.read_csv('myFile.csv', parse_dates=['Start Date'], usecols=['Start Date'])

and I got the same error.

What is the best workaround? I have not found a solution so far.

jezrael · Accepted Answer

I think you need boolean indexing for filtering:

#dont filter all columns by usecols    
df = pd.read_csv('file', parse_dates=['Start Date', 'End Date']) #columns to datetimes

#filter output first by column ID and then get min and max
a = df.loc[ df['ID'] == 56886, 'Start Date'].min()


b = df.loc[ df['ID'] == 56886, 'End Date'].max()

Get the earliest date from a column (Python Pandas) after csv.reader

Answers (2)

Related Questions