Reputation: 45
I want to create 'target_start' column in python:
id | start | end | diff | target_start |
---|---|---|---|---|
12220 | 1999-11-22 | 2008-08-31 | 3515 | 1999-11-22 |
12220 | 2018-04-16 | 2019-09-15 | 1 | 2018-04-16 |
12220 | 2019-09-16 | 2019-11-30 | 1 | 2018-04-16 |
12220 | 2019-12-01 | 2020-03-31 | 1 | 2018-04-16 |
12220 | 2020-04-01 | 2020-06-30 | -711 | 2018-04-16 |
11132 | 2018-07-20 | 2019-09-15 | 1 | 2018-07-20 |
11132 | 2019-09-16 | 2021-01-01 | -44197 | 2018-07-20 |
This is easy to solve in Excel:
but I don't know, how can I do this in pyton: First target row is "1", then:
df.loc[df.index==0,'target_start']= df['start']
I tried this code, but doesn't worked:
import pandas as pd
df=pd.read_excel('./Shift.xlsx')
#if id != id.shift(1) then target_start = start
df.loc[df['id'] != df['id'].shift(1), 'target_start'] = df['start']
#elif: diff != 1 then target_start = start
df.loc[df['diff'].shift(1) != 1, 'target_start'] = df['start']
#else: target_start = target_start.shift(1)
df.loc[(df.index != 0) & (df['id'] == df['id'].shift(1)) & (df['diff'].shift(1) == 1), 'target_start']=df['target_start'].shift(1)
print(df)
The result is:
id | start | end | diff | target_start |
---|---|---|---|---|
12220 | 1999-11-22 | 2008-08-31 | 3515 | 1999-11-22 |
12220 | 2018-04-16 | 2019-09-15 | 1 | 2018-04-16 |
12220 | 2019-09-16 | 2019-11-30 | 1 | 2018-04-16 |
12220 | 2019-12-01 | 2020-03-31 | 1 | NaT |
12220 | 2020-04-01 | 2020-06-30 | -711 | NaT |
11132 | 2018-07-20 | 2019-09-15 | 1 | 2018-07-20 |
11132 | 2019-09-16 | 2021-01-01 | -44197 | 2018-07-20 |
Anyone know how to solve this? Thanks in advance!
Upvotes: 2
Views: 77
Reputation: 45
Thank you @quest! It is fantastic :)
I fixed one thing after first else:
else:
if df.iloc[i-1, 3] != 1:
target_start.append(df.iloc[i, 1])
So the perfect code is:
df.start = pd.to_datetime(df.start)
df.end = pd.to_datetime(df.end)
df.target_start = pd.to_datetime(df.target_start)
df["id_shift"] = df.id.shift()
target_start = [df.iloc[0, 1]]
for i in range(1, df.shape[0]):
#print(i)
if df.iloc[i, 0] != df.iloc[i - 1, 0]:
target_start.append(df.iloc[i, 1])
else:
if df.iloc[i-1, 3] != 1:
target_start.append(df.iloc[i, 1])
else:
target_start.append(target_start[i - 1])
df["target_start"] = target_start
del df["id_shift"]
df.head(7)
Thanks again! You helped a lot.
Upvotes: 1
Reputation: 3936
Here is how I will implement your excel formula (which you highlighted):
df.start = pd.to_datetime(df.start)
df.end = pd.to_datetime(df.end)
df.target_start = pd.to_datetime(df.target_start)
df["id_shift"] = df.id.shift()
target_start = [df.iloc[0, 1]]
for i in range(1, df.shape[0]):
print(i)
if df.iloc[i, 0] != df.iloc[i - 1, 0]:
target_start.append(df.iloc[i, 1])
else:
if df.iloc[i, 3] == 1:
target_start.append(df.iloc[i, 1])
else:
target_start.append(target_start[i - 1])
df["target_start"] = target_start
del df["id_shift"]
It generates the following resutl:
id start end diff target_start
0 12220 1999-11-22 2008-08-31 3515 1999-11-22
1 12220 2018-04-16 2019-09-15 1 2018-04-16
2 12220 2019-09-16 2019-11-30 1 2019-09-16
3 12220 2019-12-01 2020-03-31 1 2019-12-01
4 12220 2020-04-01 2020-06-30 -711 2019-12-01
5 11132 2018-07-20 2019-09-15 1 2018-07-20
6 11132 2019-09-16 2021-01-01 -44197 2018-07-20
Upvotes: 1