Reputation: 507
If I want to find the difference between two consecutive rows in a pandas DataFrame, I can simply call the diff
function.
I have rows that contain set
s of characters. What I want to do now is compute the intersection of each set in rowise pairs. in other words, I'd like to use diff
, but supply my own function instead. Is there a way to accomplish this in pandas?
example input:
100118231 1 set([])
2 set([142.136.6])
3 set([142.136.6])
4 set([])
5 set([])
6 set([108.0.239])
desired output:
100118231 1 set([]) NaN
2 set([142.136.6]) set([])
3 set([142.136.6]) {142.136.6}
4 set([]) set([])
5 set([]) set([])
6 set([108.0.239]) set([])
I've tried using shift
, but it throws an error
In [213]: type(tgr.head(1))
Out[213]: pandas.core.frame.DataFrame
In [214]: tt=tgr.apply(lambda x: x['value'].intersection((x['value'].shift(-1))))
AttributeError: 'Series' object has no attribute 'intersection'
Upvotes: 0
Views: 724
Reputation: 13279
&
will run over all the items, there's no need to involve lambdas and the like.
> df = pd.DataFrame(['hi', set([142,136,6]), set([142, 137, 6]), set([0, 6])]).iloc[1:]
> df & df.shift(1)
0
1 NaN
2 set([142, 6])
3 set([6])
Upvotes: 1