Datacrawler
Datacrawler

Reputation: 2876

Get the earliest date from a column (Python Pandas) after csv.reader

I load some data from a CSV file with multiple columns. In my csv.reader I have an IF function. I am trying to get the earliest date from a specific column (Start Date).

I first load the data:

for row in csv.reader(open('myFile.csv')):
  if row[4] == '56886':
    key = row[4] #key = (row[4], row[33][:4], row[4])
    startDate = row[19]

When I print the column (startDate) I am getting this:

enter image description here

01) I tried using the below:

content = min(content)
print(content)

and I got this in terminal:

enter image description here

02) Then I tried changing my code:

for row in csv.reader(open('myFile.csv',
        parse_dates=['Start Date'], 
        usecols=['Start Date']))
  if row[4] == '56886':
    key = row[4] #key = (row[4], row[33][:4], row[4])
    startDate = row[19]

and I got an invalid syntax error.

03) I tried changing the line to:

pandas.read_csv('myFile.csv', parse_dates=['Start Date'], usecols=['Start Date'])

and I got the same error.

What is the best workaround? I have not found a solution so far.

Upvotes: 7

Views: 18773

Answers (2)

jezrael
jezrael

Reputation: 862681

I think you need boolean indexing for filtering:

#dont filter all columns by usecols    
df = pd.read_csv('file', parse_dates=['Start Date', 'End Date']) #columns to datetimes

#filter output first by column ID and then get min and max
a = df.loc[ df['ID'] == 56886, 'Start Date'].min()


b = df.loc[ df['ID'] == 56886, 'End Date'].max()

Upvotes: 11

Sosel
Sosel

Reputation: 1718

An example using pandas to convert one single item:

pd.to_datetime("08/27/2017")

An example using pandas to convert one list of strings:

times = []
for i in range(30):
    times.append(str(i+1)+"/01/2016")
datetimes = pd.to_datetime(times)
min(datetimes )

Upvotes: 0

Related Questions