Felix Jassler
Felix Jassler

Reputation: 1531

Pandas Iterate through rows from specified row number

I want to read data from a pandas dataframe by iterating through the rows starting from a specific row number. I know there's df.iterrows(), but it doesn't let me specify from where I want to start iterating.

In my specific case, I have a csv file that might look something like this:

Date, Temperature
21/08/2017 17:00:00,5.53
21/08/2017 18:00:00,5.58
21/08/2017 19:00:00,4.80
21/08/2017 20:00:00,4.59
21/08/2017 21:00:00,3.72
21/08/2017 22:00:00,3.95
21/08/2017 23:00:00,3.11
22/08/2017 00:00:00,3.07
22/08/2017 01:00:00,2.80
22/08/2017 02:00:00,2.75
22/08/2017 03:00:00,2.79
22/08/2017 04:00:00,2.76
22/08/2017 05:00:00,2.76
22/08/2017 06:00:00,3.06
22/08/2017 07:00:00,3.88

I want to loop through every row from a specific point in time on (let's say midnight of August 22nd), so I tried implementing it like this:

df = pandas.read_csv('file.csv')
start_date = '22/08/2017 00:00:00'

// since it's sorted, I figured I could use binary search
result = pandas.Series(df['Date']).searchsorted(start_date)

result[0] actually gives me the correct number.

I guess what I could do then is just increment that number and access the row through df.iloc[[x]], but I feel dirty doing that.

for x in range(result[0], len(df)):
    row = df.loc[[x]]

All answers I've found so far only show how to iterate the whole table.

Upvotes: 4

Views: 9940

Answers (2)

kev8484
kev8484

Reputation: 648

Just filter your dataframe before calling iterrows():

df['Date'] = pandas.to_datetime(df['Date'])
for idx, row in df[df['Date'] >= '2017-08-22'].iterrows():
    #
    # Whatever you want to do in the loop goes here
    #

Note that it isn't necessary to convert the filtering argument '2017-08-22' to a datetime object, because Pandas can handle partial string indexing.

Upvotes: 2

piRSquared
piRSquared

Reputation: 294258

Turn Date into datetime. Set Date as the index:

df.Date = pd.to_datetime(df.Date)

df = df.set_index('Date')

Then:

for date, row in df['22/08/2017 00:00:00':].iterrows():
    print(date.strftime('%c'), row.squeeze())

Tue Aug 22 00:00:00 2017 3.07
Tue Aug 22 01:00:00 2017 2.8
Tue Aug 22 02:00:00 2017 2.75
Tue Aug 22 03:00:00 2017 2.79
Tue Aug 22 04:00:00 2017 2.76
Tue Aug 22 05:00:00 2017 2.76
Tue Aug 22 06:00:00 2017 3.06
Tue Aug 22 07:00:00 2017 3.88

Upvotes: 6

Related Questions