Reputation: 4241
I have a df
df:
date shares symbol date2
0 20120614 1100 AAT.N NaN
1 20120615 1100 AAT.N NaN
2 20120616 1100 AAT.N NaN
3 20120617 1100 AAT.N NaN
4 20030405 800 ABT.N NaN
5 20030406 800 ABT.N NaN
6 20030407 800 ABT.N NaN
...
#This is what I want:
df:
date shares symbol date2
0 20120614 1100 AAT.N 20120615
1 20120615 1100 AAT.N 20120616
2 20120616 1100 AAT.N 20120617
3 20120617 1100 AAT.N NaN
4 20030405 800 ABT.N 20030406
5 20030406 800 ABT.N 20030407
6 20030407 800 ABT.N NaN
...
I want to replace df.ix[0]['date2'] with df.ix[1]['date2'] for each symbol -- the symbol changes through the dataframe so I can't just apply this through the whole dataframe.
I was going to loop through and if the symbol for i and i+1 matched:
df.ix[i]['symbol'] == df.ix[i+1]['symbol']
I was going to replace the NaN with the date.
I tried:
df.ix[i]['date2'] = df.ix[i+1]['date'] ##This failed.
I then tried:
a = df.ix[i+1]['date']
df.replace({'date2': i}, a)
###This failed as well
Any recommendations here on
1) Best process to accomplish this?
2) Basic question: How to replace an NaN (or even another number) in a pandas DF?
Thank you.
Upvotes: 1
Views: 939
Reputation: 40618
And here's a one-line solution that might be the most 'pandonic':
In [8]: df['date2'] = df.groupby('symbol').apply(lambda x: x['date'].shift(-1))
In [9]: df
Out[9]:
date shares symbol date2
0 20120614 1100 AAT.N 20120615
1 20120615 1100 AAT.N 20120616
2 20120616 1100 AAT.N 20120617
3 20120617 1100 AAT.N NaN
4 20030405 800 ABT.N 20030406
5 20030406 800 ABT.N 20030407
6 20030407 800 ABT.N NaN
Upvotes: 3
Reputation: 8558
To compare to DSM's boolean solution, here's the quick and painless groupby solution!
grouped = df.groupby('symbol')
for _, group in grouped:
df1['date2'][group.index] = group.shift(-1)['date']
Upvotes: 1
Reputation: 352979
I might do something like this:
>>> df
date shares symbol date2
0 20120614 1100 AAT.N NaN
1 20120615 1100 AAT.N NaN
2 20120616 1100 AAT.N NaN
3 20120617 1100 AAT.N NaN
4 20030405 800 ABT.N NaN
5 20030406 800 ABT.N NaN
6 20030407 800 ABT.N NaN
>>> same_symbols = df['symbol'] == df['symbol'].shift(-1)
>>> df['date2'][same_symbols] = df['date'].shift(-1)
>>> df
date shares symbol date2
0 20120614 1100 AAT.N 20120615
1 20120615 1100 AAT.N 20120616
2 20120616 1100 AAT.N 20120617
3 20120617 1100 AAT.N NaN
4 20030405 800 ABT.N 20030406
5 20030406 800 ABT.N 20030407
6 20030407 800 ABT.N NaN
This finds where the symbols are the same from one row to the next:
>>> same_symbols
0 True
1 True
2 True
3 False
4 True
5 True
6 False
Name: symbol, Dtype: bool
and then applies the shifted dates there:
>>> df['date'].shift(-1)
0 20120615
1 20120616
2 20120617
3 20030405
4 20030406
5 20030407
6 NaN
Name: date, Dtype: float64
This assumes the symbol data is contiguous and already sorted (easy enough to impose if it's not that way already.)
Alternatively you could use groupby
and then act on each group, reassembling things at the end.
Upvotes: 0