Reputation: 4241

Problems with Replace - pandas dataframe

I have a df

df:
  date     shares  symbol  date2
0 20120614  1100   AAT.N   NaN
1 20120615  1100   AAT.N   NaN
2 20120616  1100   AAT.N   NaN
3 20120617  1100   AAT.N   NaN
4 20030405  800    ABT.N   NaN
5 20030406  800    ABT.N   NaN
6 20030407  800    ABT.N   NaN
...

#This is what I want:
df:
  date     shares  symbol  date2
0 20120614  1100   AAT.N   20120615
1 20120615  1100   AAT.N   20120616
2 20120616  1100   AAT.N   20120617
3 20120617  1100   AAT.N   NaN
4 20030405  800    ABT.N   20030406
5 20030406  800    ABT.N   20030407
6 20030407  800    ABT.N   NaN
...

I want to replace df.ix[0]['date2'] with df.ix[1]['date2'] for each symbol -- the symbol changes through the dataframe so I can't just apply this through the whole dataframe.

I was going to loop through and if the symbol for i and i+1 matched:

df.ix[i]['symbol'] == df.ix[i+1]['symbol']

I was going to replace the NaN with the date.

I tried:

df.ix[i]['date2'] = df.ix[i+1]['date']  ##This failed.

I then tried:

a = df.ix[i+1]['date']
df.replace({'date2': i}, a)
###This failed as well

Any recommendations here on

1) Best process to accomplish this?

2) Basic question: How to replace an NaN (or even another number) in a pandas DF?

Thank you.

Upvotes: 1

Answers (3)

Zelazny7

Reputation: 40648

And here's a one-line solution that might be the most 'pandonic':

In [8]: df['date2'] = df.groupby('symbol').apply(lambda x: x['date'].shift(-1))

In [9]: df
Out[9]:
       date  shares symbol     date2
0  20120614    1100  AAT.N  20120615
1  20120615    1100  AAT.N  20120616
2  20120616    1100  AAT.N  20120617
3  20120617    1100  AAT.N       NaN
4  20030405     800  ABT.N  20030406
5  20030406     800  ABT.N  20030407
6  20030407     800  ABT.N       NaN

Upvotes: 3

David Marx

Reputation: 8558

To compare to DSM's boolean solution, here's the quick and painless groupby solution!

grouped = df.groupby('symbol')
for _, group in grouped:
    df1['date2'][group.index] = group.shift(-1)['date']

Upvotes: 1

DSM

Reputation: 353499

I might do something like this:

>>> df
       date  shares symbol  date2
0  20120614    1100  AAT.N    NaN
1  20120615    1100  AAT.N    NaN
2  20120616    1100  AAT.N    NaN
3  20120617    1100  AAT.N    NaN
4  20030405     800  ABT.N    NaN
5  20030406     800  ABT.N    NaN
6  20030407     800  ABT.N    NaN
>>> same_symbols = df['symbol'] == df['symbol'].shift(-1)
>>> df['date2'][same_symbols] = df['date'].shift(-1)
>>> df
       date  shares symbol     date2
0  20120614    1100  AAT.N  20120615
1  20120615    1100  AAT.N  20120616
2  20120616    1100  AAT.N  20120617
3  20120617    1100  AAT.N       NaN
4  20030405     800  ABT.N  20030406
5  20030406     800  ABT.N  20030407
6  20030407     800  ABT.N       NaN

This finds where the symbols are the same from one row to the next:

>>> same_symbols
0     True
1     True
2     True
3    False
4     True
5     True
6    False
Name: symbol, Dtype: bool

and then applies the shifted dates there:

>>> df['date'].shift(-1)
0    20120615
1    20120616
2    20120617
3    20030405
4    20030406
5    20030407
6         NaN
Name: date, Dtype: float64

This assumes the symbol data is contiguous and already sorted (easy enough to impose if it's not that way already.)

Alternatively you could use groupby and then act on each group, reassembling things at the end.

Upvotes: 0

Problems with Replace - pandas dataframe

Answers (3)

Related Questions