Vaibhav Saxena
Vaibhav Saxena

Reputation: 27

Pandas df returns twice the sum of actual sum of 2 columns

I have a pandas dataframe and I am trying to find out the sum of two columns and put it into another column called 'Total', but when I verity the number, its 2x times the actual sum. I wonder why.

Code snippet:

def pixel_count(folder):
    #area = 8.37
    data = pd.DataFrame(columns = ['Image Name', 'Black Pixels', 'White Pixels', 'Total'])
    data['Image Name'] = dirs
    count_0 = []
    count_1 = []
    for item in dirs:
        img_path = folder+'/'+item
        img = cv2.imread(img_path)
        pixels = img.reshape(-1,3)

        counts = defaultdict(int)
        for pixel in pixels:
            if pixel[0] == pixel[1] == pixel[2]:
                counts[pixel[0]] += 1

        count_0.append(counts[0])
        count_1.append(counts[255])
    data['Black Pixels'] = count_0
    data['White Pixels'] = count_1
    data['Total'] = data.loc[:,['Black Pixels','White Pixels']].sum(axis=1)
    data = data[data['Image Name'].str[0:4]=='mask']
    data.loc['Column_Total']= data.sum(numeric_only=True, axis=0)
    data = data.set_index('Image Name')
    data.loc[:,'Total'] = data.sum(numeric_only=True, axis=1)
    
    return data

pixel_count(path)

Output:

    Black Pixels    White Pixels    Total   
Image Name              
mask_14.png 1604815.0   645185.0    4500000.0   
mask_4.png  1877175.0   372825.0    4500000.0   
mask_5.png  1629168.0   620832.0    4500000.0   
mask_15.png 1687744.0   562256.0    4500000.0   
mask_17.png 1859852.0   390148.0    4500000.0   
mask_7.png  1529366.0   720634.0    4500000.0   

The total should be 2250000.0 but its 4500000.0.

Appreciate the help.

Upvotes: 0

Views: 276

Answers (3)

piterbarg
piterbarg

Reputation: 8219

In your code


    data['Total'] = data.loc[:,['Black Pixels','White 
...
    data.loc[:,'Total'] = data.sum(numeric_only=True, axis=1)

the sum on the second line includes the total calculated on the first line, so Total is double-counted. I do not think you need both lines

Upvotes: 1

jsgounot
jsgounot

Reputation: 709

Why not simply use : df["Total"] = df.sum(axis=1) ?

Upvotes: 0

kasper
kasper

Reputation: 92

With pandas you can add two columns in a more simple way:

data['Total'] = data['Black Pixels'] + data['White Pixels']

Upvotes: 0

Related Questions