tolu
tolu

Reputation: 13

How to iterate over a column in panda and populate on another column

I have two columns in df, Start_time and hours_extracted

from datetime import datetime 
for i in df['Start_time']:
    x =(i.hour)*3600
    y= (i.minute)*60
    z= (i.second)
    
    k=x+y+z
    print (x,y,z, k)
    df['hours_extracted']= k
df.head()

its just using one value of k to populate hours_extracted column, pls what do I do?

Upvotes: 1

Views: 1312

Answers (3)

Asad
Asad

Reputation: 110

You should try and avoid using for loops when working with Pandas and NumPy. In this code x,y and z are Pandas Series, which is basically a single column in a DataFrame. You can add the 3 series to create the series k. Then finally you can insert that series as a column to the DataFrame 'df'. For loops will make the code inefficient and confusing.

from datetime import datetime
x = (df['startTime'].dt.hour) * 3600
y = (df['startTime'].dt.minute) * 60
z = (df['startTime'].dt.second)
k = x+y+z
df['hours_extracted'] = k
df.head()

Upvotes: 0

Tanuj T.V.S
Tanuj T.V.S

Reputation: 36

You must not directly assign the value to it, you must use .loc in each iteration or you can append those values to a list and finally add it to

from datetime import datetime 
l=[]
for i in df['Start_time']:
    x =(i.hour)*3600
    y= (i.minute)*60
    z= (i.second)
    k=x+y+z
    l.append(k)
    
df['hours_extracted']= l
df.head()    

Upvotes: 0

mujjiga
mujjiga

Reputation: 16856

If you want to fix your code then you have to use

for l, i in enumerate(df['Start_time']):
    x =(i.hour)*3600
    y= (i.minute)*60
    z= (i.second)
    
    k=x+y+z
    df.loc[l, 'hours_extracted']= k

But a better way is

df['hours_extracted'] = df['Start_time'].apply(lambda x: x.hour*3600+x.minute*60+x.second)

Upvotes: 2

Related Questions