Reputation: 215
I have a dataframe index_crisis
and and want to create a new column that contains a 1 when the index reached a local peak and zero else.
I don't know how to go on in my code. The list peak locations
is:
[ 2 7 9 13 16 18 21] but with month[peak_locations]
I get the month of the peaks.
Date Index
38 2007-06-01 -0.56
39 2007-07-01 -0.36
40 2007-08-01 0.68
41 2007-09-01 0.24
42 2007-10-01 0.22
43 2007-11-01 0.89
44 2007-12-01 0.95
45 2008-01-01 1.53
46 2008-02-01 1.01
47 2008-03-01 1.73
48 2008-04-01 1.39
49 2008-05-01 0.96
50 2008-06-01 1.26
51 2008-07-01 2.37
52 2008-08-01 1.57
53 2008-09-01 2.95
54 2008-10-01 5.7
55 2008-11-01 5.29
56 2008-12-01 5.42
57 2009-01-01 4.99
58 2009-02-01 4.45
59 2009-03-01 4.59
60 2009-04-01 4.2
61 2009-05-01 3.12
62 2009-06-01 1.85
My expected output is a column dummy
that looks like:
0
0
1
0
0
0
0
1
0
1
0
0
0
1
0
0
1
0
1
0
0
1
0
0
0
df = pd.read_csv("index_crisis.csv", parse_dates=True)
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = pd.PeriodIndex(df.Date, freq='M').strftime("%b %Y")
data = df['Index'].values
doublediff = np.diff(np.sign(np.diff(data)))
peak_locations = np.where(doublediff == -2)[0] + 1
Upvotes: 1
Views: 84
Reputation: 1064
I think that you want to use Numpy's fancy indexing to build your array of ones and zeros. A sequence object can be used as an indexer for a Numpy array.
Following your example, suppose that your DataFrame is 62 rows long. Then:
>>> peak_locations = [2, 7, 9, 13, 16, 18, 21] # You generated this
>>> dummy = np.zeros(len(df), dtype=int) # I assume length 62 in this example
>>> print(dummy)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> dummy[peak_locations] = 1 # This is the fancy indexing hotness
>>> dummy
array([0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> df["dummy"] = dummy # Adds the new column
Upvotes: 0
Reputation: 3910
idx = df.iloc[peak_locations].index
df['dummy'] = np.where(df.index.isin(idx), 1, 0)
Date Index dummy
38 Jun 2007 -0.56 0
39 Jul 2007 -0.36 0
40 Aug 2007 0.68 1
41 Sep 2007 0.24 0
42 Oct 2007 0.22 0
43 Nov 2007 0.89 0
44 Dec 2007 0.95 0
45 Jan 2008 1.53 1
46 Feb 2008 1.01 0
47 Mar 2008 1.73 1
48 Apr 2008 1.39 0
49 May 2008 0.96 0
50 Jun 2008 1.26 0
51 Jul 2008 2.37 1
52 Aug 2008 1.57 0
53 Sep 2008 2.95 0
54 Oct 2008 5.7 1
55 Nov 2008 5.29 0
56 Dec 2008 5.42 1
57 Jan 2009 4.99 0
58 Feb 2009 4.45 0
59 Mar 2009 4.59 1
60 Apr 2009 4.2 0
61 May 2009 3.12 0
62 Jun 2009 1.85 0
Upvotes: 1
Reputation: 4007
Find the local maximum by: the value is larger than the next AND the next value is not larger than the next:
series = df['Index'].values
s = series > series.shift(1)
df[s & (s != s.shift(-1))]
Upvotes: 0