Yun Tae Hwang
Yun Tae Hwang

Reputation: 1471

Percent change in a specific time period using Python

I have a dataframe with datetime index. First of all, here is my fake data.

import pandas as pd

data1 = {'date' : ['20190219 093100', '20190219 103200','20190219 171200','20190219 193900','20190219 194500','20190220 093500','20190220 093600'],
     'number' : [18.6125, 12.85, 14.89, 15.8301, 15.85, 14.916 , 14.95]}

df1 = pd.DataFrame(data1)
df1 = df1.set_index('date')
df1.index = pd.to_datetime(df1.index).strftime('%Y-%m-%d %H:%M:%S')

What I want to do is to create a new column named "New_column" with categorical variables with 'Yes' or 'No' depends whether if a value in the "number" column is increased at least 20 percent in the same day.

So in this fake data, only the second value "12.85" will be "Yes" because it increased 23.35 percent at the timestamp "2019-02-19 19:45:00"

Even though the first value is 25% greater than the 3rd value, since it happened in the future, it should not be counted.

After the process, I should have NaN in the "New_column" for the last row of each day.

I have been trying many different ways to do it using:

How can I do this in a Pythonic way?

Upvotes: 0

Views: 738

Answers (1)

ulmefors
ulmefors

Reputation: 516

Initial setup

data = {
    'datetime' : ['20190219 093100', '20190219 103200','20190219 171200','20190219 193900','20190219 194500','20190220 093500','20190220 093600'],
    'number' : [18.6125, 12.85, 14.89, 15.8301, 15.85, 14.916 , 14.95]
}
df = pd.DataFrame(data)
df['datetime'] = df['datetime'].astype('datetime64')
df = df.sort_values('datetime')
df['date'] = df['datetime'].dt.date
df['New_column'] = 'No'

Find all rows that see a 20% increase later in the same day

indeces_true = set([])
for idx_low, row_low in df.iterrows():
    for idx_high, row_high in df.iterrows():
        if (row_low['date'] == row_high['date'] and
            row_low['datetime'] < row_high['datetime'] and
            row_low['number'] * 1.2 < row_high['number']):
            indeces_true.add(idx_low)   
# Assign 'Yes' for the true rows
for i in indeces_true:
    df.loc[i, 'New_column'] = 'Yes'

# Last timestamp every day assigned as NaN
df.loc[df['date'] != df['date'].shift(-1), 'New_column'] = np.nan

# Optionally convert to categorical variable
df['New_column'] = pd.Categorical(df['New_column'])

Output

>>> df

    datetime                number         date     New_column
0   2019-02-19 09:31:00     18.6125     2019-02-19  No
1   2019-02-19 10:32:00     12.8500     2019-02-19  Yes
2   2019-02-19 17:12:00     14.8900     2019-02-19  No
3   2019-02-19 19:39:00     15.8301     2019-02-19  No
4   2019-02-19 19:45:00     15.8500     2019-02-19  NaN
5   2019-02-20 09:35:00     14.9160     2019-02-20  No
6   2019-02-20 09:36:00     14.9500     2019-02-20  NaN

Upvotes: 2

Related Questions