BlackAndWhite
BlackAndWhite

Reputation: 73

Add a column by grouping where conditions is met

I am a beginner with the pandas library and don't know how to solve this. I have a huge data frame (I'm implementing Mask-RCNN model and then generating a CSV with the following dataframe but with many more columns).

Object_id  Detection_class  Detection_score   Proportion_bounding_box
    0            3              0.970                2.26
    1            3              0.954                23.16
    2            4              0.696                6.23
    3            9              0.658                0.26
    0            3              0.980                13.16   
    1            4              0.984                5.12
    2            45             0.255                0.06
   ...          ...              ...                 ...

What I'm trying to achieve is to add a new column that contains the sum where a condition is met. In this case, I would like to sum the value of Proportion_bounding_box where the Detection_class equals 3 or 4. In addition, where the Detection_score > .60

I've tried groupby() function successless. It is worthy to mention that every time the Object_id begins again with 0 it is counting objects from another image, thus the sum will need to start again with 0.

Having an output like this

Object_id  Detection_class  Detection_score   Proportion_bounding_box   Sum_cars_trucks
    0            3              0.970                2.26                   31.65
    1            3              0.954                23.16                  31.65 
    2            4              0.456                6.23                   31.65
    3            9              0.658                0.26                   31.65
    0            3              0.980                13.16                  18.28    
    1            4              0.984                5.12                   18.28
    2            45             0.255                0.06                   18.28
   ...          ...              ...                 ...

More precisely the value 31.65 is the sum of 2.26 + 23.16 + 6.23 because the condition is met, and then starting the sum again when the Object_id changes to 0. There is no problem if the value of the sum in this case, 31.65 is repeated in rows where the value of Detection_class is not 3 or 4(i.e Object_id row 4 with 9 value ), I'll have to clean the data afterwards anyway.

Any idea will be very appreciated.

Upvotes: 2

Views: 84

Answers (1)

jezrael
jezrael

Reputation: 862751

Use Series.isin for replace rows matching mask by Series.where, then use GroupBy.transform for new column filled by aggregated values - groups are created by compare 0 values with cumulative sum:

group = df['Object_id'].eq(0).cumsum()
m = df['Detection_class'].isin([3,4])
df['Sum_cars_trucks'] = (df['Proportion_bounding_box'].where(m)
                                                      .groupby(group)
                                                      .transform('sum'))
print (df)
   Object_id  Detection_class  Detection_score  Proportion_bounding_box  \
0          0                3            0.970                     2.26   
1          1                3            0.954                    23.16   
2          2                4            0.696                     6.23   
3          3                9            0.658                     0.26   
4          0                3            0.980                    13.16   
5          1                4            0.984                     5.12   
6          2               45            0.255                     0.06   

   Sum_cars_trucks  
0            31.65  
1            31.65  
2            31.65  
3            31.65  
4            18.28  
5            18.28  
6            18.28  

Detail:

print (df['Proportion_bounding_box'].where(m))
0     2.26
1    23.16
2     6.23
3      NaN
4    13.16
5     5.12
6      NaN
Name: Proportion_bounding_box, dtype: float64

So your real data solution should be: (if there are onlu 0 values it means no value match condition)

pd.options.display.max_columns = None

df = pd.read_csv("https://www.dropbox.com/s/mnmpxif5v43czr6/test_seg.csv?dl=1",
                 sep = ',',error_bad_lines=False)

condition_fifty = df['Detection_Class'].isin([3,4]) & (df['Detection_Score'] > .50)
condition_seventyfive = df['Detection_Class'].isin([3,4]) & (df['Detection_Score'] > .75)
condition_free = df['Detection_Class'].isin([3,4])

group = df['Object_id'].eq(0).cumsum()

df['Sum_cars_trucks_accuracy_75'] = df['Proportion_bounding_box_0_100'].where(condition_seventyfive).groupby(group).transform('sum')
df['Sum_cars_trucks_accuracy_50'] = df['Proportion_bounding_box_0_100'].where(condition_fifty).groupby(group).transform('sum')
df['Sum_cars_trucks'] = df['Proportion_bounding_box_0_100'].where(condition_free).groupby(group).transform('sum')

print (df.iloc[960:980])
     Object_id  Image_id  Cam_ID  Detection_Class  Detection_Score  \
960         25        25    1701                1         0.098000   
961         26        25    1701                3         0.089161   
962         27        25    1701                3         0.079296   
963         28        25    1701               10         0.076747   
964         29        25    1701               10         0.068189   
965         30        25    1701               62         0.066939   
966         31        25    1701               28         0.066125   
967         32        25    1701                3         0.059263   
968         33        25    1701               11         0.059223   
969         34        25    1701               84         0.058247   
970         35        25    1701               10         0.058203   
971         36        25    1701               10         0.057868   
972         37        25    1701                5         0.055125   
973         38        25    1701                1         0.051333   
974         39        25    1701                3         0.050456   
975          0        26    1701               42         0.785020   
976          1        26    1701                3         0.735685   
977          2        26    1701                3         0.658764   
978          3        26    1701                3         0.574286   
979          4        26    1701               10         0.549406   

       Timestamp_sg_time                                                URL  \
960  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
961  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
962  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
963  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
964  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
965  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
966  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
967  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
968  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
969  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
970  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
971  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
972  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
973  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
974  2019-11-13 04:10:00  https://images.data.gov.sg/api/traffic-images/...   
975  2019-11-13 04:20:00  https://images.data.gov.sg/api/traffic-images/...   
976  2019-11-13 04:20:00  https://images.data.gov.sg/api/traffic-images/...   
977  2019-11-13 04:20:00  https://images.data.gov.sg/api/traffic-images/...   
978  2019-11-13 04:20:00  https://images.data.gov.sg/api/traffic-images/...   
979  2019-11-13 04:20:00  https://images.data.gov.sg/api/traffic-images/...   

           ymin        xmin        ymax        xmax  Bounding_box_area  \
960  314.707886  553.487793  424.362549  612.194092        6437.419434   
961  142.048508  391.639465  148.185974  399.044922          45.450741   
962  148.339020  182.501083  155.744003  191.033142          63.179752   
963   59.903748  406.636078   84.600563  413.733093         175.273682   
964  120.682381  185.464798  131.528137  191.304535          63.336365   
965  398.982178    2.361536  477.532013  105.955078        8137.255859   
966  318.272369  533.578552  474.975006  589.919861        8828.832031   
967  152.243881  228.395996  168.173294  247.767349         308.574280   
968  169.543182   52.104408  193.200562   63.020443         258.244781   
969  399.414825    2.308850  476.569275   97.881790        7373.877441   
970   81.362755  200.836914  104.848663  210.157623         218.905319   
971  104.121696  196.914062  119.572815  211.193527         220.633698   
972  239.429794  381.147217  247.518219  389.079407          64.158920   
973  137.157761  185.270111  146.607330  189.621353          41.117367   
974  430.588135  587.167297  468.736328  639.310303        1989.161499   
975  149.503235  471.424500  479.375000  543.734680       23853.087891   
976  176.751541  227.258438  191.531586  254.238220         398.762390   
977  377.724915    0.000000  465.730133   61.350388        5399.154297   
978  189.206009  307.332886  206.914307  333.753540         467.864807   
979   89.527046  228.427872  110.863182  239.962189         246.097748   

     Proportion_bounding_box_0_100  Segmentation_pixels  \
960                       2.095514             0.276545   
961                       0.014795             0.116933   
962                       0.020566             0.053545   
963                       0.057055             0.003237   
964                       0.020617             0.003308   
965                       2.648846             0.002000   
966                       2.873969             0.002545   
967                       0.100447             0.004789   
968                       0.084064             0.006768   
969                       2.400351             0.015430   
970                       0.071258             0.025500   
971                       0.071821             0.053866   
972                       0.020885             0.079332   
973                       0.013385             0.132568   
974                       0.647514             0.198744   
975                       7.764677             0.024513   
976                       0.129805             0.030686   
977                       1.757537             0.039312   
978                       0.152300             0.050344   
979                       0.080110             0.057971   

     Sum_cars_trucks_accuracy_75  Sum_cars_trucks_accuracy_50  Sum_cars_trucks  
960                     0.361058                     0.509355        20.910477  
961                     0.361058                     0.509355        20.910477  
962                     0.361058                     0.509355        20.910477  
963                     0.361058                     0.509355        20.910477  
964                     0.361058                     0.509355        20.910477  
965                     0.361058                     0.509355        20.910477  
966                     0.361058                     0.509355        20.910477  
967                     0.361058                     0.509355        20.910477  
968                     0.361058                     0.509355        20.910477  
969                     0.361058                     0.509355        20.910477  
970                     0.361058                     0.509355        20.910477  
971                     0.361058                     0.509355        20.910477  
972                     0.361058                     0.509355        20.910477  
973                     0.361058                     0.509355        20.910477  
974                     0.361058                     0.509355        20.910477  
975                     0.000000                     2.039642        18.097407  
976                     0.000000                     2.039642        18.097407  
977                     0.000000                     2.039642        18.097407  
978                     0.000000                     2.039642        18.097407  
979                     0.000000                     2.039642        18.097407  

Upvotes: 1

Related Questions