Mato
Mato

Reputation: 59

Ratio between columns with nan values. How to choose the column, for each row, without nan values in pandas?

I have the following DF:

z = pd.DataFrame({2016:[4.0,np.nan,6.0,7.0,np.nan],2017:[np.nan,0,5.0,0,np.nan],2018:[4.0,3,np.nan,3,1.0],2019:[2.0,np.nan,np.nan,np.nan,3.4],'ratio':''})

I need a column named 'ratio' to store the ratio between the first year (without nan values) and last year (without nan values) for each row.

For example, for first row it would be 2016/2019, for the second, 2017/2018, for the third, 2016/2017 and so on.

I couldn't figure out how to solve this....

Upvotes: 1

Views: 60

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195408

Try:

# filter only years
years = df.filter(regex=r"^\d+$")

df["ratio"] = years.apply(
    lambda row: row[row.first_valid_index()] / row[row[::-1].first_valid_index()],
    axis=1,
)

print(df)

Prints:

   2016  2017  2018  2019     ratio
0   4.0   NaN   4.0   2.0  2.000000
1   NaN   0.0   3.0   NaN  0.000000
2   6.0   5.0   NaN   NaN  1.200000
3   7.0   0.0   3.0   NaN  2.333333
4   NaN   NaN   1.0   3.4  0.294118

OR: Use .last_valid_index() (Thanks @BhanuTez)

df["ratio"] = years.apply(
    lambda row: row[row.first_valid_index()] / row[row.last_valid_index()],
    axis=1,
)

Upvotes: 0

Panda Kim
Panda Kim

Reputation: 13212

Code

If you need vectorisation operations, use code below:

tmp = df.filter(regex=r"^\d+$").bfill(axis=1).ffill(axis=1)
df["ratio"] = tmp.iloc[:, 0] / tmp.iloc[:, -1]

z -> df

In your example, the variable in the data frame is z, but used df for the convenience of the viewer.

Upvotes: 1

Related Questions