Reputation: 27
I have a pandas dataframe and I am trying to find out the sum of two columns and put it into another column called 'Total', but when I verity the number, its 2x times the actual sum. I wonder why.
Code snippet:
def pixel_count(folder):
#area = 8.37
data = pd.DataFrame(columns = ['Image Name', 'Black Pixels', 'White Pixels', 'Total'])
data['Image Name'] = dirs
count_0 = []
count_1 = []
for item in dirs:
img_path = folder+'/'+item
img = cv2.imread(img_path)
pixels = img.reshape(-1,3)
counts = defaultdict(int)
for pixel in pixels:
if pixel[0] == pixel[1] == pixel[2]:
counts[pixel[0]] += 1
count_0.append(counts[0])
count_1.append(counts[255])
data['Black Pixels'] = count_0
data['White Pixels'] = count_1
data['Total'] = data.loc[:,['Black Pixels','White Pixels']].sum(axis=1)
data = data[data['Image Name'].str[0:4]=='mask']
data.loc['Column_Total']= data.sum(numeric_only=True, axis=0)
data = data.set_index('Image Name')
data.loc[:,'Total'] = data.sum(numeric_only=True, axis=1)
return data
pixel_count(path)
Output:
Black Pixels White Pixels Total
Image Name
mask_14.png 1604815.0 645185.0 4500000.0
mask_4.png 1877175.0 372825.0 4500000.0
mask_5.png 1629168.0 620832.0 4500000.0
mask_15.png 1687744.0 562256.0 4500000.0
mask_17.png 1859852.0 390148.0 4500000.0
mask_7.png 1529366.0 720634.0 4500000.0
The total should be 2250000.0 but its 4500000.0.
Appreciate the help.
Upvotes: 0
Views: 276
Reputation: 8219
In your code
data['Total'] = data.loc[:,['Black Pixels','White
...
data.loc[:,'Total'] = data.sum(numeric_only=True, axis=1)
the sum on the second line includes the total calculated on the first line, so Total is double-counted. I do not think you need both lines
Upvotes: 1
Reputation: 92
With pandas you can add two columns in a more simple way:
data['Total'] = data['Black Pixels'] + data['White Pixels']
Upvotes: 0