Reputation: 59
I have the following DF:
z = pd.DataFrame({2016:[4.0,np.nan,6.0,7.0,np.nan],2017:[np.nan,0,5.0,0,np.nan],2018:[4.0,3,np.nan,3,1.0],2019:[2.0,np.nan,np.nan,np.nan,3.4],'ratio':''})
I need a column named 'ratio' to store the ratio between the first year (without nan values) and last year (without nan values) for each row.
For example, for first row it would be 2016/2019, for the second, 2017/2018, for the third, 2016/2017 and so on.
I couldn't figure out how to solve this....
Upvotes: 1
Views: 60
Reputation: 195408
Try:
# filter only years
years = df.filter(regex=r"^\d+$")
df["ratio"] = years.apply(
lambda row: row[row.first_valid_index()] / row[row[::-1].first_valid_index()],
axis=1,
)
print(df)
Prints:
2016 2017 2018 2019 ratio
0 4.0 NaN 4.0 2.0 2.000000
1 NaN 0.0 3.0 NaN 0.000000
2 6.0 5.0 NaN NaN 1.200000
3 7.0 0.0 3.0 NaN 2.333333
4 NaN NaN 1.0 3.4 0.294118
OR: Use .last_valid_index()
(Thanks @BhanuTez)
df["ratio"] = years.apply(
lambda row: row[row.first_valid_index()] / row[row.last_valid_index()],
axis=1,
)
Upvotes: 0
Reputation: 13212
Code
If you need vectorisation operations, use code below:
tmp = df.filter(regex=r"^\d+$").bfill(axis=1).ffill(axis=1)
df["ratio"] = tmp.iloc[:, 0] / tmp.iloc[:, -1]
z
-> df
In your example, the variable in the data frame is z
, but used df
for the convenience of the viewer.
Upvotes: 1