JB_rox
JB_rox

Reputation: 37

How to iterate over rows and assign values to a new column

I have a dataframe with over 75k rows, having about 13 pre-existing columns. Now, I want to create a new column based on an if statement, such that:

if each row of a certain column has the same value as the next, then the value in the new column for that row would be 0 or 1.

The if statement checks for two equalities (columns are tags_list and gateway_id).

The below code snippet is what I have tried

for i in range(1,len(df_sort['date'])-1):

    if (df_sort.iloc[i]['tags_list'] == df_sort.iloc[i+1]['tags_list']) & (df_sort.iloc[i]['gateway_id'] == df_sort[i+1]['gateway_id']):
        df_sort.iloc[i]['Transit']=0
    else:
        df_sort.iloc[i]['Transit']=1

Getting a keyerror :2 in this case

PS: All of the columns have the same number of rows

Upvotes: 0

Views: 445

Answers (2)

brentertainer
brentertainer

Reputation: 2200

There is numpy machinery for this, namely numpy.diff. Consider a DataFrame that already has some generic column 'x' populated.

In [48]: df['x'].values                                                         
Out[48]: array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3])

In [49]: df['x_diff'] = (np.diff(df['x'], prepend=0) != 0) * 1                   

In [50]: df['x_diff'].values                                                    
Out[50]: array([0, 0, 0, 0, 1, 0, 0, 1, 0, 1])

If you need the zeros and ones flipped, just change != to ==.

Upvotes: 0

mujjiga
mujjiga

Reputation: 16906

if (df_sort.iloc[i]['tags_list'] == df_sort.iloc[i+1]['tags_list']) & 
       (df_sort.iloc[i]['gateway_id'] == df_sort.iloc[i+1]['gateway_id']):

df_sort[i+1]['gateway_id'] should be df_sort.iloc[i+1]['gateway_id']

Also, are you sure you want to iterate from 1 and not from 0 ?

Upvotes: 1

Related Questions