Krupali Mistry
Krupali Mistry

Reputation: 644

if specific value/string occurs in the entire dataframe I want to sum its index values

i have a dataframe in which I need to find a specific image name in the entire dataframe and sum its index values every time they are found. SO my data frame looks like:

c            1                 2           3                    4    
g                   
0    180731-1-61.jpg    180731-1-61.jpg   180731-1-61.jpg   180731-1-61.jpg     
1   1209270004-2.jpg    180609-2-31.jpg   1209270004-2.jpg  1209270004-2.jpg    
2   1209270004-1.jpg    180414-2-38.jpg   180707-1-31.jpg   1209050002-1.jpg    
3   1708260004-1.jpg    1209270004-2.jpg  180609-2-31.jpg   1209270004-1.jpg    
4   1108220001-5.jpg    1209270004-1.jpg  1108220001-5.jpg  1108220001-2.jpg    

I need to find the 1209270004-2.jpg in entire dataframe. And as it is found at index 1 and 3 I want to add the index values so it should be
1+3+1+1=6. I tried the code:

img_fname = '1209270004-2.jpg'
df2 = df1[df1.eq(img_fname).any(1)]
sum = int(np.sum(df2.index.values))
print(sum)

I am getting the answer of sum 4 i.e 1+3=4. But it should be 6. If the string occurence is only once or twice or thrice or four times like for eg 180707-1-31 is in column 3. then the sum should be 45+45+3+45 = 138. Which signifies that if the string is not present in the dataframe take vallue as 45 instead the index value.

Upvotes: 1

Views: 67

Answers (2)

Akanksha
Akanksha

Reputation: 179

If dataset does not have many columns, this can also work with your original question

df1 = pd.DataFrame({"A":["aa","ab", "cd", "ab", "aa"], "B":["ab","ab", "ab", "aa", "ab"]})
s = 0    
for i in df1.columns:
    s= s+ sum(df1.index[df1.loc[:,i] == "ab"].tolist())  

Input :

    A   B
0   aa  ab
1   ab  ab
2   cd  ab
3   ab  aa
4   aa  ab

Output :11

enter image description here

Based on second requirement:

enter image description here

Upvotes: 1

jezrael
jezrael

Reputation: 862731

You can multiple boolean mask by index values and then sum:

img_fname = '1209270004-1.jpg'
s = df1.eq(img_fname).mul(df1.index.to_series(), 0).sum()
print (s)
1    2
2    4
3    0
4    3
dtype: int64

out = np.where(s == 0, 45, s).sum()
print (out)
54

Upvotes: 1

Related Questions