Reputation: 45
I have a data frame named df1 in Python as follows:
SYMBOL prediction_succesful DATE VALUE
0 ABC Y 29-03-2018 100
1 DEF Y 30-03-2018 96
2 GHI Y 01-04-2018 105
3 ABC NaN 30-03-2018 55
I want to create another data frame named df3 containing 3 columns named symbol, no_of_predictions and no_of_succesful_predictions where no_of_predictions contains the no of times each value is repeated in the symbol column and no_of_successful_predictions contains the no of times the value in prediction_succesful was Y I tried the following code but it doesnt work:
df3 = DataFrame(columns =['symbol','no_of_predictions','no_of_successful_predictions'])
for i in unique_symbols:
counter = 0
counter2 = 0
for j in df1:
mask = df1[j].symbol == i
if mask:
counter += 1
mask2 = df1[j].prediction_succesful
if mask2 == "Y":
counter2 += 1
df3.append(['symbol'=i,'no_of_predictions' = counter,'no_of_successful_predictions' = counter2])
Here df1 is the main data frame I am working upon
And unique_symbols is a list containing all the unique symbols in df1 data frame such as unique_symbols = ['ABC','DEF','GHI']
Some help would be appreciated
Upvotes: 0
Views: 88
Reputation: 18201
You could let
df3 = df1.groupby('SYMBOL').prediction_succesful.aggregate([len, lambda x: (x == 'Y').sum()])
df3.columns = ['no_of_predictions', 'no_of_successful_predictions']
Upvotes: 1
Reputation: 51335
If I understand correctly, the following works:
by_symbol = df1.groupby('SYMBOL')
df3 = pd.DataFrame({'no_of_predictions': by_symbol.size(),
'no_successful_predictions':
by_symbol.apply(lambda x: (x['prediction_succesful']=='Y').sum())})
>>> df3
no_of_predictions no_successful_predictions
SYMBOL
ABC 2 1
DEF 1 1
GHI 1 1
What is happening is that you're grouping your dataframe by symbol, then getting the size of each "sub dataframe", which is being used for no_of_predictions
, and finally, the number of rows in which prediction_successful
is Y
as your last column
Upvotes: 1