Reputation: 13
I have two columns in df, Start_time and hours_extracted
from datetime import datetime
for i in df['Start_time']:
x =(i.hour)*3600
y= (i.minute)*60
z= (i.second)
k=x+y+z
print (x,y,z, k)
df['hours_extracted']= k
df.head()
its just using one value of k to populate hours_extracted column, pls what do I do?
Upvotes: 1
Views: 1312
Reputation: 110
You should try and avoid using for loops when working with Pandas and NumPy. In this code x,y and z are Pandas Series, which is basically a single column in a DataFrame. You can add the 3 series to create the series k. Then finally you can insert that series as a column to the DataFrame 'df'. For loops will make the code inefficient and confusing.
from datetime import datetime
x = (df['startTime'].dt.hour) * 3600
y = (df['startTime'].dt.minute) * 60
z = (df['startTime'].dt.second)
k = x+y+z
df['hours_extracted'] = k
df.head()
Upvotes: 0
Reputation: 36
You must not directly assign the value to it, you must use .loc in each iteration or you can append those values to a list and finally add it to
from datetime import datetime
l=[]
for i in df['Start_time']:
x =(i.hour)*3600
y= (i.minute)*60
z= (i.second)
k=x+y+z
l.append(k)
df['hours_extracted']= l
df.head()
Upvotes: 0
Reputation: 16856
If you want to fix your code then you have to use
for l, i in enumerate(df['Start_time']):
x =(i.hour)*3600
y= (i.minute)*60
z= (i.second)
k=x+y+z
df.loc[l, 'hours_extracted']= k
But a better way is
df['hours_extracted'] = df['Start_time'].apply(lambda x: x.hour*3600+x.minute*60+x.second)
Upvotes: 2