Reputation: 43
I have a pandas dataframe contains the following columns:
sate_1 state_2
----- -----
New York Washington
Ohio Utah
what I want to do is to construct an input date for each distinct combination of state_1 and state_2 so the results would be:
sate_1 state_2 date_time
New York Washington 2017-11-01 00:00
New York Washington 2017-11-01 03:00
New York Washington 2017-11-01 06:00
Ohio Utah 2017-11-01 00:00
Ohio Utah 2017-11-01 03:00
Ohio Utah 2017-11-01 06:00
so basically for each distinct row I want to return 3 time periods starting from today,s date. I have the logic to get the datetime but I'm doing this through a loop of all the records which is a time consuming since my data would contain more than 1M rows . I was wondering is there a way to do this using the apply function, which I'm assuming it will be much faster.
Upvotes: 3
Views: 3252
Reputation: 11105
import pandas as pd
df = pd.DataFrame({'state_1': ['New York', 'Ohio'],
'state_2': ['Washington', 'Utah']})
# df:
# state_1 state_2
# 0 New York Washington
# 1 Ohio Utah
df['date'] = pd.to_datetime('2017-11-01')
# df:
# state_1 state_2 date
# 0 New York Washington 2017-11-01
# 1 Ohio Utah 2017-11-01
pd.concat([
pd.DataFrame(
{'state_1': row.state_1,
'state_2': row.state_2,
'date': pd.date_range(row.date, freq='3h', periods=3)
}
) for i, row in df.iterrows()
], ignore_index=True).loc[:, ['state_1', 'state_2', 'date']]
# df:
# state_1 state_2 date
# 0 New York Washington 2017-11-01 00:00:00
# 1 New York Washington 2017-11-01 03:00:00
# 2 New York Washington 2017-11-01 06:00:00
# 3 Ohio Utah 2017-11-01 00:00:00
# 4 Ohio Utah 2017-11-01 03:00:00
# 5 Ohio Utah 2017-11-01 06:00:00
Upvotes: 2