Reputation: 37
I have a dataframe with over 75k rows, having about 13 pre-existing columns. Now, I want to create a new column based on an if
statement, such that:
if each row of a certain column has the same value as the next, then the value in the new column for that row would be 0 or 1.
The if
statement checks for two equalities (columns are tags_list
and gateway_id
).
The below code snippet is what I have tried
for i in range(1,len(df_sort['date'])-1):
if (df_sort.iloc[i]['tags_list'] == df_sort.iloc[i+1]['tags_list']) & (df_sort.iloc[i]['gateway_id'] == df_sort[i+1]['gateway_id']):
df_sort.iloc[i]['Transit']=0
else:
df_sort.iloc[i]['Transit']=1
Getting a keyerror :2
in this case
PS: All of the columns have the same number of rows
Upvotes: 0
Views: 445
Reputation: 2200
There is numpy
machinery for this, namely numpy.diff
. Consider a DataFrame that already has some generic column 'x' populated.
In [48]: df['x'].values
Out[48]: array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3])
In [49]: df['x_diff'] = (np.diff(df['x'], prepend=0) != 0) * 1
In [50]: df['x_diff'].values
Out[50]: array([0, 0, 0, 0, 1, 0, 0, 1, 0, 1])
If you need the zeros and ones flipped, just change !=
to ==
.
Upvotes: 0
Reputation: 16906
if (df_sort.iloc[i]['tags_list'] == df_sort.iloc[i+1]['tags_list']) &
(df_sort.iloc[i]['gateway_id'] == df_sort.iloc[i+1]['gateway_id']):
df_sort[i+1]['gateway_id']
should be df_sort.iloc[i+1]['gateway_id']
Also, are you sure you want to iterate from 1 and not from 0 ?
Upvotes: 1