doomsday
doomsday

Reputation: 45

Data frame append in Python

I have a data frame named df1 in Python as follows:

    SYMBOL   prediction_succesful     DATE        VALUE
0    ABC         Y                  29-03-2018     100
1    DEF         Y                  30-03-2018      96
2    GHI         Y                  01-04-2018     105
3    ABC        NaN                 30-03-2018      55

I want to create another data frame named df3 containing 3 columns named symbol, no_of_predictions and no_of_succesful_predictions where no_of_predictions contains the no of times each value is repeated in the symbol column and no_of_successful_predictions contains the no of times the value in prediction_succesful was Y I tried the following code but it doesnt work:

df3 = DataFrame(columns =['symbol','no_of_predictions','no_of_successful_predictions'])

for i in unique_symbols:
    counter =  0
    counter2 = 0
    for j in df1:
        mask = df1[j].symbol == i
        if mask:
            counter += 1
            mask2 = df1[j].prediction_succesful
            if mask2 == "Y":
                counter2 += 1

    df3.append(['symbol'=i,'no_of_predictions' = counter,'no_of_successful_predictions' = counter2])

Here df1 is the main data frame I am working upon
And unique_symbols is a list containing all the unique symbols in df1 data frame such as unique_symbols = ['ABC','DEF','GHI'] Some help would be appreciated

Upvotes: 0

Views: 88

Answers (2)

fuglede
fuglede

Reputation: 18201

You could let

df3 = df1.groupby('SYMBOL').prediction_succesful.aggregate([len, lambda x: (x == 'Y').sum()])
df3.columns = ['no_of_predictions', 'no_of_successful_predictions']

Upvotes: 1

sacuL
sacuL

Reputation: 51335

If I understand correctly, the following works:

by_symbol = df1.groupby('SYMBOL')

df3 = pd.DataFrame({'no_of_predictions': by_symbol.size(),
      'no_successful_predictions': 
          by_symbol.apply(lambda x: (x['prediction_succesful']=='Y').sum())})

>>> df3
        no_of_predictions  no_successful_predictions
SYMBOL                                              
ABC                     2                          1
DEF                     1                          1
GHI                     1                          1

What is happening is that you're grouping your dataframe by symbol, then getting the size of each "sub dataframe", which is being used for no_of_predictions, and finally, the number of rows in which prediction_successful is Y as your last column

Upvotes: 1

Related Questions