Can't Tell
Can't Tell

Reputation: 13416

New Column With Repeated Value from a different column

I have the following code. I need to add a column deaths_last_tuesday which shows the deaths from last Tuesday, for each day.

import pandas as pd

data = {'date': ['2014-05-01', '2014-05-02', '2014-05-03', '2014-05-04', '2014-05-05', '2014-05-06', '2014-05-07',
                 '2014-05-08', '2014-05-09', '2014-05-10', '2014-05-11', '2014-05-12', '2014-05-13', '2014-05-14',
                 '2014-05-15', '2014-05-16', '2014-05-17', '2014-05-18', '2014-05-19', '2014-05-20'],
        'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41, 23, 56, 23, 34, 23, 67, 54, 34, 45, 12]}

df = pd.DataFrame(data, columns=['date', 'battle_deaths'])

df['date'] = pd.to_datetime(df['date'])
df['day_of_week'] = df['date'].dt.dayofweek
df = df.set_index('date')
df.sort_index()


            battle_deaths  day_of_week deaths_last_tuesday
date                                  
2014-05-01             34            3 
2014-05-02             25            4   24
2014-05-03             26            5   24
2014-05-04             15            6   24
2014-05-05             15            0   24
2014-05-06             14            1   24
2014-05-07             26            2   24
2014-05-08             25            3   24
2014-05-09             62            4   25
2014-05-10             41            5   25
2014-05-11             23            6   25
2014-05-12             56            0   25

I want to do this so that I want to compare the deaths of each day with the deaths of the previous Tuesday.

Upvotes: 1

Views: 33

Answers (1)

jezrael
jezrael

Reputation: 862511

Use:

df['deaths_last_tuesday'] = df['battle_deaths'].where(df['day_of_week'].eq(3)).ffill().shift()

print (df)
            battle_deaths  day_of_week  deaths_last_tuesday
date                                                       
2014-05-01             34            3                  NaN
2014-05-02             25            4                 34.0
2014-05-03             26            5                 34.0
2014-05-04             15            6                 34.0
2014-05-05             15            0                 34.0
2014-05-06             14            1                 34.0
2014-05-07             26            2                 34.0
2014-05-08             25            3                 34.0
2014-05-09             62            4                 25.0
2014-05-10             41            5                 25.0
2014-05-11             23            6                 25.0
2014-05-12             56            0                 25.0
2014-05-13             23            1                 25.0
2014-05-14             34            2                 25.0
2014-05-15             23            3                 25.0
2014-05-16             67            4                 23.0
2014-05-17             54            5                 23.0
2014-05-18             34            6                 23.0
2014-05-19             45            0                 23.0
2014-05-20             12            1                 23.0

Explanation:

First compare by eq (==):

print (df['day_of_week'].eq(3))
date
2014-05-01     True
2014-05-02    False
2014-05-03    False
2014-05-04    False
2014-05-05    False
2014-05-06    False
2014-05-07    False
2014-05-08     True
2014-05-09    False
2014-05-10    False
2014-05-11    False
2014-05-12    False
2014-05-13    False
2014-05-14    False
2014-05-15     True
2014-05-16    False
2014-05-17    False
2014-05-18    False
2014-05-19    False
2014-05-20    False
Name: day_of_week, dtype: bool

Then create missing values for not matched values by where:

print (df['battle_deaths'].where(df['day_of_week'].eq(3)))
date
2014-05-01    34.0
2014-05-02     NaN
2014-05-03     NaN
2014-05-04     NaN
2014-05-05     NaN
2014-05-06     NaN
2014-05-07     NaN
2014-05-08    25.0
2014-05-09     NaN
2014-05-10     NaN
2014-05-11     NaN
2014-05-12     NaN
2014-05-13     NaN
2014-05-14     NaN
2014-05-15    23.0
2014-05-16     NaN
2014-05-17     NaN
2014-05-18     NaN
2014-05-19     NaN
2014-05-20     NaN
Name: battle_deaths, dtype: float64

Forwrd fill missing values:

print (df['battle_deaths'].where(df['day_of_week'].eq(3)).ffill())
date
2014-05-01    34.0
2014-05-02    34.0
2014-05-03    34.0
2014-05-04    34.0
2014-05-05    34.0
2014-05-06    34.0
2014-05-07    34.0
2014-05-08    25.0
2014-05-09    25.0
2014-05-10    25.0
2014-05-11    25.0
2014-05-12    25.0
2014-05-13    25.0
2014-05-14    25.0
2014-05-15    23.0
2014-05-16    23.0
2014-05-17    23.0
2014-05-18    23.0
2014-05-19    23.0
2014-05-20    23.0
Name: battle_deaths, dtype: float64

And last shift:

print (df['battle_deaths'].where(df['day_of_week'].eq(3)).ffill().shift())
date
2014-05-01     NaN
2014-05-02    34.0
2014-05-03    34.0
2014-05-04    34.0
2014-05-05    34.0
2014-05-06    34.0
2014-05-07    34.0
2014-05-08    34.0
2014-05-09    25.0
2014-05-10    25.0
2014-05-11    25.0
2014-05-12    25.0
2014-05-13    25.0
2014-05-14    25.0
2014-05-15    25.0
2014-05-16    23.0
2014-05-17    23.0
2014-05-18    23.0
2014-05-19    23.0
2014-05-20    23.0
Name: battle_deaths, dtype: float64

Upvotes: 1

Related Questions