sampeterson
sampeterson

Reputation: 479

Set increasing value of one column based on value in another column in pandas dataframe

I have a pandas dataframe that looks like this:

import pandas as pd
d = {'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10']}
df = pd.DataFrame(data=d)
df

    date
0   2021-01-01
1   2021-01-02
2   2021-01-03
3   2021-01-04
4   2021-01-05
5   2021-01-06
6   2021-01-07
7   2021-01-08
8   2021-01-09
9   2021-01-10

I would to add a new column out to this df to indicate when a wedding started with the value 0. The rows after the start date should increment while the rows before the start date should decrement. For example, if the wedding started on '2021-01-05', my desired output is this:

    date        out
0   2021-01-01  -5
1   2021-01-02  -4
2   2021-01-03  -3
3   2021-01-04  -2
4   2021-01-05  -1
5   2021-01-06   0
6   2021-01-07   1
7   2021-01-08   2
8   2021-01-09   3
9   2021-01-10   4

Upvotes: 0

Views: 74

Answers (3)

Ynjxsjmh
Ynjxsjmh

Reputation: 30050

If your date column doesn't duplicate, you can try

df['time'] = range(len(df))
df['time'] = df['time'] - df.set_index('date').loc['2021-01-06', 'time']
print(df)

         date  time
0  2021-01-01    -5
1  2021-01-02    -4
2  2021-01-03    -3
3  2021-01-04    -2
4  2021-01-05    -1
5  2021-01-06     0
6  2021-01-07     1
7  2021-01-08     2
8  2021-01-09     3
9  2021-01-10     4

Or

df['time'] = df.index.values - df['date'].tolist().index('2021-01-06')

Upvotes: 1

BENY
BENY

Reputation: 323356

Doing

df['out'] = (pd.to_datetime(df.date) - pd.to_datetime('2021-01-06')).dt.days
Out[20]: 
0   -5
1   -4
2   -3
3   -2
4   -1
5    0
6    1
7    2
8    3
9    4
Name: date, dtype: int64

Upvotes: 3

ArchAngelPwn
ArchAngelPwn

Reputation: 3046

You can use cumcount() to get the expected output posted

import pandas as pd
d = {'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10']}
df = pd.DataFrame(data=d)
df['Control'] = 1
date_lookback_location = df.loc[df['date'] == '2021-01-06'].index.tolist()[0]
df['time'] = df.sort_values(['date'], ascending=True).groupby(['Control']).cumcount() -date_lookback_location
df[['date', 'time']]

Upvotes: 0

Related Questions