math17
math17

Reputation: 17

Python alternative to R mutate

I want to convert R code into Python. The code in R is

df %>% mutate(N = if_else(Interval != lead(Interval) | row_number() == n(), criteria/Count, NA_real_)) 

In Python I wrote the following:

import pandas as pd
import numpy as np
df = pd.read_table('Fd.csv', sep=',')

for i in range(1,len(df.Interval)-1):
    x = df.Interval[i]
    n = df.Interval[i+1]
    if x != n | x==df.Interval.tail().all():
        df['new']=(df.criteria/df.Count)
    else:
        df['new']='NaN'
df.to_csv (r'dataframe.csv', index = False, header=True)

However, the output returns all NaNs.

Here is what the data looks like

Interval | Count    |   criteria    
0        0               0                             
0        1               0                            
0        2               0                             
0        3               0                             
1        4               1                             
1        5               2                             
1        6               3                            
1        7               4                             
2        8               1                          
2        9               2       
3        10              3

and this is what I want to get ( I also need to consider the last line)

Interval | Count    |   criteria  |  new

0        0               0                             
0        1               0                            
0        2               0                             
0        3               0       0                      
1        4               1                             
1        5               2                             
1        6               3                            
1        7               4       0.5714                     
2        8               1                          
2        9               2       0.2222 

3        10              3       0.3333

If anyone could help find my mistake, I would greatly appreciate.

Upvotes: 0

Views: 281

Answers (1)

insulanus
insulanus

Reputation: 271

1. Start indexing at 0

The first thing to note is that Python starts indexing at 0 (in contrast to R which starts at 1). Therefore, you need to modify the index range of your for-loop.

2. Specify row indices

When calling

df['new']=(df.criteria/df.Count)

or

df['new']='NaN'

you are setting/getting all the values in the "new" column. However, you intend to set the value only in some rows. Therefore, you need to specify the row.

3. Working example

import pandas as pd

df = pd.DataFrame()
df["Interval"] = [0,0,0,0,1,1,1,1,2,2,3]
df["Count"] = [0,1,2,3,4,5,6,7,8,9,10]
df["criteria"] = [0,0,0,0,1,2,3,4,1,2,3]
df["new"] = ["NaN"] * len(df.Interval)

last_row = len(df.Interval) - 1
for row in range(0, len(df.Interval)):
    current_value = df.Interval[row]
    next_value = df.Interval[min(row + 1, last_row)]
    if (current_value != next_value) or (row == last_row): 
        result = df.loc[row, 'criteria'] / df.loc[row, 'Count']
        df.loc[row, 'new'] = result

Upvotes: 1

Related Questions