Reputation: 73
I am a beginner with the pandas library and don't know how to solve this. I have a huge data frame (I'm implementing Mask-RCNN model and then generating a CSV with the following dataframe but with many more columns).
Object_id Detection_class Detection_score Proportion_bounding_box
0 3 0.970 2.26
1 3 0.954 23.16
2 4 0.696 6.23
3 9 0.658 0.26
0 3 0.980 13.16
1 4 0.984 5.12
2 45 0.255 0.06
... ... ... ...
What I'm trying to achieve is to add a new column that contains the sum where a condition is met. In this case, I would like to sum the value of Proportion_bounding_box
where the Detection_class
equals 3 or 4. In addition, where the Detection_score
> .60
I've tried groupby()
function successless. It is worthy to mention that every time the Object_id
begins again with 0 it is counting objects from another image, thus the sum will need to start again with 0.
Having an output like this
Object_id Detection_class Detection_score Proportion_bounding_box Sum_cars_trucks
0 3 0.970 2.26 31.65
1 3 0.954 23.16 31.65
2 4 0.456 6.23 31.65
3 9 0.658 0.26 31.65
0 3 0.980 13.16 18.28
1 4 0.984 5.12 18.28
2 45 0.255 0.06 18.28
... ... ... ...
More precisely the value 31.65 is the sum of 2.26 + 23.16 + 6.23 because the condition is met, and then starting the sum again when the Object_id
changes to 0. There is no problem if the value of the sum in this case, 31.65 is repeated in rows where the value of Detection_class
is not 3 or 4(i.e Object_id
row 4 with 9 value ), I'll have to clean the data afterwards anyway.
Any idea will be very appreciated.
Upvotes: 2
Views: 84
Reputation: 862751
Use Series.isin
for replace rows matching mask by Series.where
, then use GroupBy.transform
for new column filled by aggregated values - groups are created by compare 0
values with cumulative sum:
group = df['Object_id'].eq(0).cumsum()
m = df['Detection_class'].isin([3,4])
df['Sum_cars_trucks'] = (df['Proportion_bounding_box'].where(m)
.groupby(group)
.transform('sum'))
print (df)
Object_id Detection_class Detection_score Proportion_bounding_box \
0 0 3 0.970 2.26
1 1 3 0.954 23.16
2 2 4 0.696 6.23
3 3 9 0.658 0.26
4 0 3 0.980 13.16
5 1 4 0.984 5.12
6 2 45 0.255 0.06
Sum_cars_trucks
0 31.65
1 31.65
2 31.65
3 31.65
4 18.28
5 18.28
6 18.28
Detail:
print (df['Proportion_bounding_box'].where(m))
0 2.26
1 23.16
2 6.23
3 NaN
4 13.16
5 5.12
6 NaN
Name: Proportion_bounding_box, dtype: float64
So your real data solution should be: (if there are onlu 0
values it means no value match condition)
pd.options.display.max_columns = None
df = pd.read_csv("https://www.dropbox.com/s/mnmpxif5v43czr6/test_seg.csv?dl=1",
sep = ',',error_bad_lines=False)
condition_fifty = df['Detection_Class'].isin([3,4]) & (df['Detection_Score'] > .50)
condition_seventyfive = df['Detection_Class'].isin([3,4]) & (df['Detection_Score'] > .75)
condition_free = df['Detection_Class'].isin([3,4])
group = df['Object_id'].eq(0).cumsum()
df['Sum_cars_trucks_accuracy_75'] = df['Proportion_bounding_box_0_100'].where(condition_seventyfive).groupby(group).transform('sum')
df['Sum_cars_trucks_accuracy_50'] = df['Proportion_bounding_box_0_100'].where(condition_fifty).groupby(group).transform('sum')
df['Sum_cars_trucks'] = df['Proportion_bounding_box_0_100'].where(condition_free).groupby(group).transform('sum')
print (df.iloc[960:980])
Object_id Image_id Cam_ID Detection_Class Detection_Score \
960 25 25 1701 1 0.098000
961 26 25 1701 3 0.089161
962 27 25 1701 3 0.079296
963 28 25 1701 10 0.076747
964 29 25 1701 10 0.068189
965 30 25 1701 62 0.066939
966 31 25 1701 28 0.066125
967 32 25 1701 3 0.059263
968 33 25 1701 11 0.059223
969 34 25 1701 84 0.058247
970 35 25 1701 10 0.058203
971 36 25 1701 10 0.057868
972 37 25 1701 5 0.055125
973 38 25 1701 1 0.051333
974 39 25 1701 3 0.050456
975 0 26 1701 42 0.785020
976 1 26 1701 3 0.735685
977 2 26 1701 3 0.658764
978 3 26 1701 3 0.574286
979 4 26 1701 10 0.549406
Timestamp_sg_time URL \
960 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
961 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
962 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
963 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
964 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
965 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
966 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
967 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
968 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
969 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
970 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
971 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
972 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
973 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
974 2019-11-13 04:10:00 https://images.data.gov.sg/api/traffic-images/...
975 2019-11-13 04:20:00 https://images.data.gov.sg/api/traffic-images/...
976 2019-11-13 04:20:00 https://images.data.gov.sg/api/traffic-images/...
977 2019-11-13 04:20:00 https://images.data.gov.sg/api/traffic-images/...
978 2019-11-13 04:20:00 https://images.data.gov.sg/api/traffic-images/...
979 2019-11-13 04:20:00 https://images.data.gov.sg/api/traffic-images/...
ymin xmin ymax xmax Bounding_box_area \
960 314.707886 553.487793 424.362549 612.194092 6437.419434
961 142.048508 391.639465 148.185974 399.044922 45.450741
962 148.339020 182.501083 155.744003 191.033142 63.179752
963 59.903748 406.636078 84.600563 413.733093 175.273682
964 120.682381 185.464798 131.528137 191.304535 63.336365
965 398.982178 2.361536 477.532013 105.955078 8137.255859
966 318.272369 533.578552 474.975006 589.919861 8828.832031
967 152.243881 228.395996 168.173294 247.767349 308.574280
968 169.543182 52.104408 193.200562 63.020443 258.244781
969 399.414825 2.308850 476.569275 97.881790 7373.877441
970 81.362755 200.836914 104.848663 210.157623 218.905319
971 104.121696 196.914062 119.572815 211.193527 220.633698
972 239.429794 381.147217 247.518219 389.079407 64.158920
973 137.157761 185.270111 146.607330 189.621353 41.117367
974 430.588135 587.167297 468.736328 639.310303 1989.161499
975 149.503235 471.424500 479.375000 543.734680 23853.087891
976 176.751541 227.258438 191.531586 254.238220 398.762390
977 377.724915 0.000000 465.730133 61.350388 5399.154297
978 189.206009 307.332886 206.914307 333.753540 467.864807
979 89.527046 228.427872 110.863182 239.962189 246.097748
Proportion_bounding_box_0_100 Segmentation_pixels \
960 2.095514 0.276545
961 0.014795 0.116933
962 0.020566 0.053545
963 0.057055 0.003237
964 0.020617 0.003308
965 2.648846 0.002000
966 2.873969 0.002545
967 0.100447 0.004789
968 0.084064 0.006768
969 2.400351 0.015430
970 0.071258 0.025500
971 0.071821 0.053866
972 0.020885 0.079332
973 0.013385 0.132568
974 0.647514 0.198744
975 7.764677 0.024513
976 0.129805 0.030686
977 1.757537 0.039312
978 0.152300 0.050344
979 0.080110 0.057971
Sum_cars_trucks_accuracy_75 Sum_cars_trucks_accuracy_50 Sum_cars_trucks
960 0.361058 0.509355 20.910477
961 0.361058 0.509355 20.910477
962 0.361058 0.509355 20.910477
963 0.361058 0.509355 20.910477
964 0.361058 0.509355 20.910477
965 0.361058 0.509355 20.910477
966 0.361058 0.509355 20.910477
967 0.361058 0.509355 20.910477
968 0.361058 0.509355 20.910477
969 0.361058 0.509355 20.910477
970 0.361058 0.509355 20.910477
971 0.361058 0.509355 20.910477
972 0.361058 0.509355 20.910477
973 0.361058 0.509355 20.910477
974 0.361058 0.509355 20.910477
975 0.000000 2.039642 18.097407
976 0.000000 2.039642 18.097407
977 0.000000 2.039642 18.097407
978 0.000000 2.039642 18.097407
979 0.000000 2.039642 18.097407
Upvotes: 1