Reputation: 91
I am looking to calculate the percent of years a company has had positive earnings. My dataframe has thousands of companies, and so I am trying to figure out how to isolate each company to perform this calculation.
Using the sample data of one company above and assuming RSG.AX was founded in 2007, I want the resulting column to read:
percentPositiveEarnings
(NaN/12) = NaN
(5/11) = 0.45
(4/10) = 0.4
(3/9) = 0.33
(3/8) = 0.375
(3/7) = 0.429
(3/6) = 0.5
(2/5) = 0.4
(1/4) = 0.25
(1/3) = 0.33
(1/2) = 0.5
(1/1) = 1
Each cell in this column should calculate the number of years the company has had positive earnings divided by the total number of years since it was founded.
I am not sure if I have to use .groupby()
to separate each company's data, as I have never used it before. Any help is appreciated!
Upvotes: 0
Views: 177
Reputation: 109576
# Sample data.
df = pd.DataFrame({
'RIC': ['RSG.AX'] * 12 + ['IBM'] * 2,
'Date': list(range(2007, 2019)) + list(range(2000, 2002)),
'FCF': [4.66, -2.36, -9.3, -5.7, 7.7, 1.2, -2.6, -2.4, -4.3, 1.1, 4.22, np.nan, 1, -2]
})
Note that I chose to ignore NaN
valuess rather than have the result become NaN
.
df = df.sort_values(['RIC', 'Date']).reset_index(drop=True)
pct_profitable = df.groupby('RIC')['FCF'].transform(
lambda s: s.gt(0).cumsum() / s.notnull().cumsum())
>>> df.assign(pct_profitable=pct_profitable)
RIC Date FCF pct_profitable
0 IBM 2000 1.00 1.000000
1 IBM 2001 -2.00 0.500000
2 RSG.AX 2007 4.66 1.000000
3 RSG.AX 2008 -2.36 0.500000
4 RSG.AX 2009 -9.30 0.333333
5 RSG.AX 2010 -5.70 0.250000
6 RSG.AX 2011 7.70 0.400000
7 RSG.AX 2012 1.20 0.500000
8 RSG.AX 2013 -2.60 0.428571
9 RSG.AX 2014 -2.40 0.375000
10 RSG.AX 2015 -4.30 0.333333
11 RSG.AX 2016 1.10 0.400000
12 RSG.AX 2017 4.22 0.454545
13 RSG.AX 2018 NaN 0.454545
Upvotes: 1