Ghozally
Ghozally

Reputation: 93

Looping function in dataframe

I have data stored in a DataFrame and I have function to manipulate each row and store it in new DataFrame format.

import pandas as pd  

def get_data(start_time):  
    from datetime import datetime, timedelta
    start_time = datetime.strptime(start_time)
    ten_second = start_time + timedelta(0,10)
    twenty_second = start_time + timedelta(0,20)
    combine = {'start' : ten_second, 'end' : twenty_second}
    rsam=pd.DataFrame(combine, index=[0])
    return(rsam)


df_event = pd.DataFrame([["2019-01-10 13:16:25"],
             ["2019-01-29 13:56:21"],
             ["2019-02-09 14:41:21"],
             ["2019-02-07 11:28:50"]])

temp=[]
for index, row in df_event.iterrows():
    temp=get_data(row[0])

I read in the internet they suggest me to use iterrows() but my looping function still get error

What I expected in temp variable

Index          ten_second            twenty_second
  0         2019-01-10 13:16:35   2019-01-10 13:16:45
  1         2019-01-29 13:56:31   2019-01-29 13:56:41
  3         2019-02-09 14:41:31   2019-02-09 14:41:41
  4         2019-02-17 11:29:00   2019-02-17 11:29:10

Upvotes: 1

Views: 47

Answers (1)

Erfan
Erfan

Reputation: 42906

You don't need iterrows or your function. Simply use pd.Timedelta:

c1 = df_event[0] + pd.Timedelta('10s')
c2 = df_event[0] + pd.Timedelta('20s')

temp = pd.DataFrame({'ten_second':c1,
                     'twenty_second':c2})

Output

           ten_second       twenty_second
0 2019-01-10 13:16:35 2019-01-10 13:16:45
1 2019-01-29 13:56:31 2019-01-29 13:56:41
2 2019-02-09 14:41:31 2019-02-09 14:41:41
3 2019-02-07 11:29:00 2019-02-07 11:29:10

Or write a function if you need more of these columns:

def add_time(dataframe, col, seconds):

    newcol = dataframe[col] + pd.Timedelta(seconds)

    return newcol

temp = pd.DataFrame({'ten_second':add_time(df_event, 0, '10s'),
                    'twenty_second':add_time(df_event, 0, '20s')})

Output

           ten_second       twenty_second
0 2019-01-10 13:16:35 2019-01-10 13:16:45
1 2019-01-29 13:56:31 2019-01-29 13:56:41
2 2019-02-09 14:41:31 2019-02-09 14:41:41
3 2019-02-07 11:29:00 2019-02-07 11:29:10

Or we can do this in one line using assign:

temp = pd.DataFrame().assign(ten_second=df_event[0] + pd.Timedelta('10s'), 
                             twenty_second=df_event[0] + pd.Timedelta('10s'))

Upvotes: 2

Related Questions