Reputation: 3041
I have a data-frame like this,
date day_of_wk wk_num wkend_hol_ind freq_daily
1/1/2007 Monday 1 H
1/2/2007 Tuesday 1 H
1/3/2007 Wednesday 1 D
1/4/2007 Thursday 1 D
1/5/2007 Friday 1 D
1/6/2007 Saturday 1 W
1/7/2007 Sunday 1 W
1/8/2007 Monday 2 D
1/9/2007 Tuesday 2 D
1/10/2007 Wednesday 2 D
1/11/2007 Thursday 2 D
1/12/2007 Friday 2 H
1/13/2007 Saturday 2 W
1/14/2007 Sunday 2 W
I am trying to create a new column inside the data frame as "freq_weekly".
This column will have a value of "W" for the last value of "D" for each value of "wk_num" and blank for other columns.
for example, 1/5/2017 will have a value "W" and 1/11/2017 will have a value "W". Others will have blanks in it.
Kindly help me with this.
Upvotes: 0
Views: 43
Reputation: 862691
Idea is create helper mask
for consecutive groups by ne
, shift
and cumsum
, then filter only D
rows and get last values by Series.duplicated
with keep='last'
, then excluded rows by Series.reindex
:
m = df['freq_daily'].eq('D')
mask = m.ne(m.shift()).cumsum()[m].duplicated(keep='last').reindex(df.index, fill_value=True)
If need test more columns for duplicates:
mask = (df.assign(m=m.ne(m.shift()).cumsum())[m]
.duplicated(['wk_num','freq_daily'],keep='last')
.reindex(df.index, fill_value=True))
So possible create new column by numpy.where
:
df['freq_weekly'] = np.where(mask, '', 'W')
print (df)
date day_of_wk wk_num wkend_hol_ind freq_daily freq_weekly
0 1/1/2007 Monday 1 H NaN
1 1/2/2007 Tuesday 1 H NaN
2 1/3/2007 Wednesday 1 NaN D
3 1/4/2007 Thursday 1 NaN D
4 1/5/2007 Friday 1 NaN D W
5 1/6/2007 Saturday 1 W NaN
6 1/7/2007 Sunday 1 W NaN
7 1/8/2007 Monday 2 NaN D
8 1/9/2007 Tuesday 2 NaN D
9 1/10/2007 Wednesday 2 NaN D
10 1/11/2007 Thursday 2 NaN D W
11 1/12/2007 Friday 2 H NaN
12 1/13/2007 Saturday 2 W NaN
13 1/14/2007 Sunday 2 W NaN
Upvotes: 1