Reputation: 1520
I've looked through a bunch of similar questions, but I cannot figure out how to actually apply the principles to my own case. I'm therefore trying to figure out a simple example I can work from - basically I need the idiots' guide before I can look at more complex examples
Consider a dataframe that contains a list of names and times, and a known start time. I then want to update the dataframe with the finish time, which is calculated from starttime + Time
import pandas as pd
import datetime
df = pd.DataFrame({"Name": ["Kate","Sarah","Isabell","Connie","Elsa","Anne","Lin"],
"Time":[3, 6,1, 7, 23,3,4]})
starttime = datetime.datetime.strptime('2020-02-04 00:00:00', '%Y-%m-%d %H:%M:%S')
I know that for each case I can calculate the finish time using
finishtime = starttine + datetime.datetime.timedelta(minutes = df.iloc[0,1])
what I can't figure out is how to use this while iterating over the df rows and updating a third column in the dataframe with the output.
I tried
df["FinishTime"] = np.nan
for row in df.itertuples():
df.at[row,"FinishTime"] = starttine + datetime.datetime.timedelta(minutes = row.Time)
but it gave a lot of errors I couldn't unravel. How am I meant to do this?
I am aware that the advice to iterating over a dataframe is don't - I'm not committed to iterating, I just need some way to calculate that final column and add it to the dataframe. My real data is about 200k lines.
Upvotes: 1
Views: 98
Reputation: 5183
Use pd.to_timedelta()
import datetime
starttime = datetime.datetime.strptime('2020-02-04 00:00:00', '%Y-%m-%d %H:%M:%S')
df = pd.DataFrame({"Name": ["Kate","Sarah","Isabell","Connie","Elsa","Anne","Lin"],
"Time":[3, 6,1, 7, 23,3,4]})
df.Time = pd.to_timedelta(df.Time, unit='m')
# df = df.assign(FinishTime = df.Time + starttime)
df['FinishTime'] = df.Time + starttime # as pointed out by Trenton McKinney, .assign() is only one way to create new columns
# creating with df['new_col'] has the benefit of not having to copy the full df
print(df)
Output
Name Time FinishTime
0 Kate 00:03:00 2020-02-04 00:03:00
1 Sarah 00:06:00 2020-02-04 00:06:00
2 Isabell 00:01:00 2020-02-04 00:01:00
3 Connie 00:07:00 2020-02-04 00:07:00
4 Elsa 00:23:00 2020-02-04 00:23:00
5 Anne 00:03:00 2020-02-04 00:03:00
6 Lin 00:04:00 2020-02-04 00:04:00
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_timedelta.html
Maybe not at all cost, but pandas takes advantage of C implementations to improve performance by several orders of magnitude. There are many (many) common functions already implemented for our convenience.
Here is a great stackoverflow conversation about this very topic.
Upvotes: 2