Reputation: 121
I have data like this, without z1, what i need is to add a column to DataFrame, so it will add column z1 and represent values as in the example, what it should do is to shift z value equally on 1 day before for the same Start date.
I was thinking it could be done with apply and lambda in pandas, but i`m not sure how to define lambda function
data = pd.read_csv("....")
data["Z"] = data[[
"Start", "Z"]].apply(lambda x:
Upvotes: 3
Views: 1123
Reputation: 863781
You can use DataFrameGroupBy.shift
with merge
:
#if not datetime
df['date'] = pd.to_datetime(df.date)
df.set_index('date', inplace=True)
df1 = df.groupby('start')['z'].shift(freq='1D',periods=1).reset_index()
print (pd.merge(df.reset_index(),df1, on=['start','date'], how='left', suffixes=('','1')))
date start z z1
0 2012-12-01 324 564545 NaN
1 2012-12-01 384 5555 NaN
2 2012-12-01 349 554 NaN
3 2012-12-02 855 635 NaN
4 2012-12-02 324 56 564545.0
5 2012-12-01 341 98 NaN
6 2012-12-03 324 888 56.0
EDIT:
Try find duplicates and fillna
by 0
:
df['date'] = pd.to_datetime(df.date)
df.set_index('date', inplace=True)
df1 = df.groupby('start')['z'].shift(freq='1D',periods=1).reset_index()
df2 = pd.merge(df.reset_index(),df1, on=['start','date'], how='left', suffixes=('','1'))
mask = df2.start.duplicated(keep=False)
df2.ix[mask, 'z1'] = df2.ix[mask, 'z1'].fillna(0)
print (df2)
date start z z1
0 2012-12-01 324 564545 0.0
1 2012-12-01 384 5555 NaN
2 2012-12-01 349 554 NaN
3 2012-12-02 855 635 NaN
4 2012-12-02 324 56 564545.0
5 2012-12-01 341 98 NaN
6 2012-12-03 324 888 56.0
Upvotes: 3