Reputation: 17
I want to convert R code into Python. The code in R is
df %>% mutate(N = if_else(Interval != lead(Interval) | row_number() == n(), criteria/Count, NA_real_))
In Python I wrote the following:
import pandas as pd
import numpy as np
df = pd.read_table('Fd.csv', sep=',')
for i in range(1,len(df.Interval)-1):
x = df.Interval[i]
n = df.Interval[i+1]
if x != n | x==df.Interval.tail().all():
df['new']=(df.criteria/df.Count)
else:
df['new']='NaN'
df.to_csv (r'dataframe.csv', index = False, header=True)
However, the output returns all NaNs.
Here is what the data looks like
Interval | Count | criteria
0 0 0
0 1 0
0 2 0
0 3 0
1 4 1
1 5 2
1 6 3
1 7 4
2 8 1
2 9 2
3 10 3
and this is what I want to get ( I also need to consider the last line)
Interval | Count | criteria | new
0 0 0
0 1 0
0 2 0
0 3 0 0
1 4 1
1 5 2
1 6 3
1 7 4 0.5714
2 8 1
2 9 2 0.2222
3 10 3 0.3333
If anyone could help find my mistake, I would greatly appreciate.
Upvotes: 0
Views: 281
Reputation: 271
1. Start indexing at 0
The first thing to note is that Python starts indexing at 0 (in contrast to R which starts at 1). Therefore, you need to modify the index range of your for-loop.
2. Specify row indices
When calling
df['new']=(df.criteria/df.Count)
or
df['new']='NaN'
you are setting/getting all the values in the "new" column. However, you intend to set the value only in some rows. Therefore, you need to specify the row.
3. Working example
import pandas as pd
df = pd.DataFrame()
df["Interval"] = [0,0,0,0,1,1,1,1,2,2,3]
df["Count"] = [0,1,2,3,4,5,6,7,8,9,10]
df["criteria"] = [0,0,0,0,1,2,3,4,1,2,3]
df["new"] = ["NaN"] * len(df.Interval)
last_row = len(df.Interval) - 1
for row in range(0, len(df.Interval)):
current_value = df.Interval[row]
next_value = df.Interval[min(row + 1, last_row)]
if (current_value != next_value) or (row == last_row):
result = df.loc[row, 'criteria'] / df.loc[row, 'Count']
df.loc[row, 'new'] = result
Upvotes: 1