Reputation: 83437
Is there any neat way to count the number of changes of value in each column in a data frame in pandas?
I don't want to have to loop myself over each column, e.g.:
import pandas as pd
frame = pd.DataFrame({
'time':[1234567000,1234567005,1234567009],
'X1':[96.32,96.01,96.05],
'X2':[23.88,23.96,23.96]
},columns=['time','X1','X2'])
print(frame)
changes = []
for column_name in frame.columns.values:
print('column_name: {0}'.format(column_name))
changes.append(sum(frame[column_name]!=frame[column_name].shift(1)))
print('changes: {0}'.format(changes))
returns:
time X1 X2
0 1234567000 96.32 23.88
1 1234567005 96.01 23.96
2 1234567009 96.05 23.96
column_name: time
column_name: X1
column_name: X2
changes: [3, 3, 2]
Upvotes: 3
Views: 7242
Reputation: 880797
If the values are numeric you could take the differences between adjacent rows and test if the difference is non-zero. Then take a sum down each column to count the number of changes in value:
In [48]: (frame.diff(axis=0) != 0).sum(axis=0)
Out[48]:
time 3
X1 3
X2 2
dtype: int64
If the values are not necessarily numeric, then a more general way would be to
compare the frame
against itself shift
-ed down by one row -- this is similar to the code you posted, except the operation is done on the entire DataFrame instead of column-by-column:
In [50]: (frame != frame.shift(axis=0)).sum(axis=0)
Out[50]:
time 3
X1 3
X2 2
dtype: int64
The numeric version is faster, the shifted version is more robust.
Upvotes: 12