Junaid Mohammad
Junaid Mohammad

Reputation: 477

creating a matrix of flags for given dates in a pandas dataframe

I want to create zeros in my dataframe for particular date ranges for selective columns. I am stuck at finding any efficient solution.

My code creates a matrix of 1s. lets say dates=1/10/2016 - 16/8/2018 (i.e.ytd). matrix1cols=A,B,C,D:

df = pd.DataFrame(np.ones(shape=(len(dates), len(matrix1cols))), index=dates)
df.columns = ['A','B','C', 'D']

Now, I wish to make the Q1 (jan-mar) dates for column A = 0, Q2 dates for B = 0, Q3 dates for C = 0 and Q4 dates for col D = 0, for all years, in df. (I am essentially creating flags for myself)

Ps- my date has many years, and I have simplified the dataset for purpose of ease.

Upvotes: 3

Views: 476

Answers (2)

piRSquared
piRSquared

Reputation: 294488

Setup

dates = pd.date_range('2016/10/01', '2018/08/16', freq='M')
matrixcols = list('ABCD')
df = pd.DataFrame(np.ones((len(dates), len(matrixcols)), int), dates, matrixcols)

            A  B  C  D
2016-10-31  1  1  1  1
2016-11-30  1  1  1  1
2016-12-31  1  1  1  1
2017-01-31  1  1  1  1
2017-02-28  1  1  1  1
2017-03-31  1  1  1  1
2017-04-30  1  1  1  1
2017-05-31  1  1  1  1
2017-06-30  1  1  1  1
2017-07-31  1  1  1  1
2017-08-31  1  1  1  1
2017-09-30  1  1  1  1
2017-10-31  1  1  1  1
2017-11-30  1  1  1  1
2017-12-31  1  1  1  1
2018-01-31  1  1  1  1
2018-02-28  1  1  1  1
2018-03-31  1  1  1  1
2018-04-30  1  1  1  1
2018-05-31  1  1  1  1
2018-06-30  1  1  1  1
2018-07-31  1  1  1  1

Numpy slicing

Create a custom array that defines where to place zeros

i = np.array([
    #A  B  C  D
    [1, 1, 0, 1],  # Q1 -> Only column C is zero
    [1, 0, 0, 0],  # Q2 -> cols B, C, D are zero
    [0, 0, 1, 1],  # Q3 -> cols A, B are zero
    [0, 1, 1, 0],  # Q4 -> cols A, D are zero
])

q = df.index.quarter - 1
df * i[q]

            A  B  C  D
2016-10-31  0  1  1  0
2016-11-30  0  1  1  0
2016-12-31  0  1  1  0
2017-01-31  1  1  0  1
2017-02-28  1  1  0  1
2017-03-31  1  1  0  1
2017-04-30  1  0  0  0
2017-05-31  1  0  0  0
2017-06-30  1  0  0  0
2017-07-31  0  0  1  1
2017-08-31  0  0  1  1
2017-09-30  0  0  1  1
2017-10-31  0  1  1  0
2017-11-30  0  1  1  0
2017-12-31  0  1  1  0
2018-01-31  1  1  0  1
2018-02-28  1  1  0  1
2018-03-31  1  1  0  1
2018-04-30  1  0  0  0
2018-05-31  1  0  0  0
2018-06-30  1  0  0  0
2018-07-31  0  0  1  1

Another view to see that it is working for correct quarters.

i = np.array([
    #A  B  C  D
    [1, 1, 0, 1],  # Q1 -> Only column C is zero
    [1, 0, 0, 0],  # Q2 -> cols B, C, D are zero
    [0, 0, 1, 1],  # Q3 -> cols A, B are zero
    [0, 1, 1, 0],  # Q4 -> cols A, D are zero
])

q = df.index.quarter - 1
df.set_index(df.index.to_period('Q'), append=True).swaplevel(0, 1) * i[q]

                   A  B  C  D
2016Q4 2016-10-31  0  1  1  0
       2016-11-30  0  1  1  0
       2016-12-31  0  1  1  0
2017Q1 2017-01-31  1  1  0  1
       2017-02-28  1  1  0  1
       2017-03-31  1  1  0  1
2017Q2 2017-04-30  1  0  0  0
       2017-05-31  1  0  0  0
       2017-06-30  1  0  0  0
2017Q3 2017-07-31  0  0  1  1
       2017-08-31  0  0  1  1
       2017-09-30  0  0  1  1
2017Q4 2017-10-31  0  1  1  0
       2017-11-30  0  1  1  0
       2017-12-31  0  1  1  0
2018Q1 2018-01-31  1  1  0  1
       2018-02-28  1  1  0  1
       2018-03-31  1  1  0  1
2018Q2 2018-04-30  1  0  0  0
       2018-05-31  1  0  0  0
       2018-06-30  1  0  0  0
2018Q3 2018-07-31  0  0  1  1

Upvotes: 2

jpp
jpp

Reputation: 164773

One solution is to use a simple for loop. Take care to convert your index to datetime as a preliminary step, e.g. via df.index = pd.to_datetime(df.index).

for q, col in enumerate(df, 1):
    df.loc[df.index.quarter == q, col] = 0

Equivalently, in this case, but more verbose:

for q, col in zip(range(1, 5), df):
    df.loc[df.index.quarter == q, col] = 0

Upvotes: 2

Related Questions