oymonk
oymonk

Reputation: 353

append to dataframe raises Error: cannot concatenate object of type '<class 'tuple'>'; only Series and DataFrame objs are valid

I have two folders of photos. The second folder is supposedly comprised entirely of duplicates of the photos in the first folder. My job is to confirm that the second folder is in fact entirely comprised of duplicates.

My script takes a photo from folder number two, and compares it to every photo in folder number one. Each comparison produces a similarity value. If the similarity value is greater than 16 (indicating a positive match) a counter variable increases by one. Once a photo in folder number two has been checked against all the photos in folder one, the counter is checked. If it is still zero, the photo gets added to a list. This part of the code works and I'm happy with it.

The problem is that I also want to obtain a list of the near matches from folder one (ie. a photo that has a similarity ranking from 1 to 16 with the photo from folder two) so that I can do a manual check of these photos. I also want these results to be in the dataframe format for easy rendering into a visual html page. Here's what I would like as an end result:

data = {'Photo': ['C:\Lucy Maud in Garden.jpg','C:\Henry by car.jpg','C:\Lucy and Henry arms together.jpg','C:\Lucy Maud with dog.jpg'],
     'NearMatch': ['C:\Lucy Maud in Garden2.jpg','C:\Henry by car2.jpg','C:\Lucy and Henry arms together2.jpg','C:\Lucy Maud with dog2.jpg'],
     'Similarity': [1,2,1,11]
        }


df = pd.DataFrame (data, columns = ['Photo','NearMatch','Similarity'])

Here is my code:

from __future__ import division

import cv2
import numpy as np
import glob
import pandas as pd

    # Sift and Flann
sift = cv2.SIFT_create()


index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)

#prep the empty lists

countInner = 0
countOuter = 1
countNoMatch = 0
nearMatch = []
nearMatch2 = []
listOfSimilarities = []
listOfDisimilarities = []

# Load all the images

folder1 = r"C:/ProbablyDups/**"
folder2 = r"C:/DefinitiveCopy/**"


extensionsOnly = ('.jpeg','.jpg','.png','.tif','.tiff','.gif')

siftOut1 = {}
for a in glob.iglob(folder1,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue
    image1 = cv2.imread(a)
    kp_1, desc_1 = sift.detectAndCompute(image1, None)
    siftOut1[a]=(kp_1,desc_1)

siftOut2 = {}
for a in glob.iglob(folder2,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue
    image1 = cv2.imread(a)
    kp_1, desc_1 = sift.detectAndCompute(image1, None)
    siftOut2[a]=(kp_1,desc_1)

#Compare photos in loops
for a in glob.iglob(folder1,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue

    (kp_1,desc_1) = siftOut1[a]

    for b in glob.iglob(folder2,recursive=True):


        if not b.lower().endswith(extensionsOnly):

            continue

        if b.lower().endswith(extensionsOnly):

            countInner += 1


        (kp_2,desc_2) = siftOut2[b]

        matches = flann.knnMatch(desc_1, desc_2, k=2)

        good_points = []

        for m, n in matches:
            if m.distance < 0.6*n.distance:
                good_points.append(m)

        number_keypoints = 0
        if len(kp_1) >= len(kp_2):
            number_keypoints = len(kp_1)
        else:
            number_keypoints = len(kp_2)

        percentage_similarity = int(float(len(good_points)) / number_keypoints * 100)
        # add a tick to the counter if there is positive match
        if percentage_similarity > 16:
            countNoMatch =+1
        #part that is not working:
        if percentage_similarity < 16 and percentage_similarity > 0:
            nearMatch.append(a)
            nearMatch2.append(b)
            listOfSimilarities.append(percentage_similarity)
    
    if countNoMatch == 0:
        listOfDisimilarities.append(a)
        df2=pd.DataFrame({"NoMatch":listOfDisimilarities})
        zippedList =  list(zip(nearMatch,nearMatch2, listOfSimilarities))
        print(zippedList)
        nearMatch = []
        nearMatch2 = []
        final_df = pd.concat(zippedList, ignore_index=True)
    
    countNoMatch = 0
    if a.lower().endswith(extensionsOnly):
        countOuter += 1
print(final_df)

df.to_csv(r"C:/Documents/NearMatch.csv")

What I have tried to do:

I tried to add a new loop that asks at the point of comparison: is this similarity ranking between 1 and 16? If yes, then add to list nearMatch2. Then, when the loop completes, the code asks a new question: is the counter (indicating no positive match greater than 16) at zero? If yes, zip the following lists together: nearMatch2, nearMatch and listOfSimilarities (representing the ranking number).

The problem is that when everything is done, I get my data in the form of lists of tuples and I don't know how to transform this into a dataframe. I've tried append, assign, loc and iloc, concat but nothing works. With concat, I get the error Error: cannot concatenate object of type '<class 'tuple'>'; only Series and DataFrame objs are valid

Upvotes: 2

Views: 199

Answers (1)

oymonk
oymonk

Reputation: 353

Got it working - found something called extend, which adds to a list. Still not totally elegant though - other solutions welcome.

from __future__ import division

import cv2
import numpy as np
import glob
import pandas as pd



    # Sift and Flann
sift = cv2.SIFT_create()


index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)

# Load all the images1

countInner = 0
countOuter = 1
countNoMatch = 0
nearMatch = []
nearMatch2 = []
listOfSimilarities = []
nearMatchAgg = []
nearMatch2Agg = []
listOfSimilaritiesAgg = []


folder1 = r"/media/folderTwo/**"
folder2 = r"/media/folderOne/**"


extensionsOnly = ('.jpeg','.jpg','.png','.tif','.tiff','.gif')

siftOut1 = {}
for a in glob.iglob(folder1,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue
    image1 = cv2.imread(a)
    kp_1, desc_1 = sift.detectAndCompute(image1, None)
    siftOut1[a]=(kp_1,desc_1)

siftOut2 = {}
for a in glob.iglob(folder2,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue
    image1 = cv2.imread(a)
    kp_1, desc_1 = sift.detectAndCompute(image1, None)
    siftOut2[a]=(kp_1,desc_1)


for a in glob.iglob(folder1,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue

    (kp_1,desc_1) = siftOut1[a]

    for b in glob.iglob(folder2,recursive=True):


        if not b.lower().endswith(extensionsOnly):

            continue

        if b.lower().endswith(extensionsOnly):

            countInner += 1

        # print(countInner, "", countOuter, "", countNoMatch)

        # you don't need this when you are comparing two folders
        # if countInner <= countOuter:

        #     continue


        (kp_2,desc_2) = siftOut2[b]

        matches = flann.knnMatch(desc_1, desc_2, k=2)

        good_points = []

        for m, n in matches:
            if m.distance < 0.6*n.distance:
                good_points.append(m)

        number_keypoints = 0
        if len(kp_1) >= len(kp_2):
            number_keypoints = len(kp_1)
        else:
            number_keypoints = len(kp_2)

        percentage_similarity = int(float(len(good_points)) / number_keypoints * 100)
        # print(percentage_similarity)
        if percentage_similarity > 16:
            countNoMatch =+1
        if percentage_similarity < 16 and percentage_similarity > 0:
            nearMatch.append(a)
            nearMatch2.append(b)
            listOfSimilarities.append(percentage_similarity)
    
    if countNoMatch == 0:
        listOfDisimilarities.append(a)
        df2=pd.DataFrame({"NoMatch":listOfDisimilarities})
        nearMatchAgg.extend(nearMatch)
        nearMatch2Agg.extend(nearMatch2)
        listOfSimilaritiesAgg.extend(listOfSimilarities)
        nearMatch = []
        nearMatch2 = []
        listOfSimilarities=[]
    
    zippedList = list(zip(nearMatchAgg,nearMatch2Agg, listOfSimilaritiesAgg))
    
    countNoMatch = 0
    if a.lower().endswith(extensionsOnly):
        countOuter += 1
dfObj = pd.DataFrame(zippedList, columns = ['Original', 'Title' , 'Similarity'])

dfObj.to_csv(r"C:/Documents/PhotoResults.csv")

Upvotes: 2

Related Questions