alexleahy123456
alexleahy123456

Reputation: 1

Trying to find duplicates in a 2D List (PYTHON)

Trying to find duplicates in an array where each list inside the list is a different row of a document. Im trying to find the words where that are the same

def helper(a):
  for x in range(len(a)-1):
    for y in range(len(a[x])):
      for i in range(len(a)):
        for j in range(len(a[x])-1):
          if(a[x][y]==a[i][j]):
            if(x!=i and y!=j):
              print(a[i][j])

a=[[i, will, always, be, very, happy][happy,people, are, cool, very]]

only prints out happy when I want happy and very to be printed if I change the -1 in the for loops it gets an index out of bounds error

Upvotes: 0

Views: 441

Answers (4)

Rajarshi Bandopadhyay
Rajarshi Bandopadhyay

Reputation: 25

Okay, let's flatten this out using functools.reduce first, and then use the built-in set datatype to wipe duplicates out.

  1. Import the right method:
from functools import reduce
  1. Define the first part of the function, which makes the list 1D:
def helper(a):
    b = reduce(lambda x, y: x + y, a)
    # TODO: rest of the function

This will give us a 1D list, but it will still have several repeating elements. Essentially, what we are doing is that we are adding all the member lists together into one big list. Let me explain this part.

  • First, to recap, a lambda function is a way to define a nameless function in one line, often inside the argument of a function call. It is written using the keyword lambda. That is what the lambda x, y: x + y does here: it defines a Lambda Function which adds two entities. Note that when this function receives two lists as arguments, it returns a single list containing all their members.
  • Second, the reduce function from the functools library takes two arguments - a function (in this case, the lambda function that adds), and a list (or iterable, if you know what those are). The function it takes as an argument must be a reducing function - a function which takes in two arguments and returns a single value. Essentially, the reduce function will take the first two members of the list and apply the function it reads as its argument, then take the result of this and apply the function again to this result and the third member of the list, and then take the result of that and apply the function to the result and the fourth member of the list, and so on. In the end, it will have reduced the entire list to a single entity.
  • So essentially, a 2D list is a list of lists. Its members are all lists. When the lambda function I have provided is applied to them, they get joined. This is done from the first two lists all the way to the last one, and we have a single list containing all the values in the original 2D list.
  1. Now, we have to extract unique values from this list. We will use the built-in set datatype to do this. To obtain unique values of a list L, use:
L_unique = list(set(L))

Applying this inside our function:

def helper(a):
    b = reduce(lambda x, y: x + y, a)
    c = list( set(b) )
    # TODO: Print out the members

This is done, all that remains is to print out everything inside the new list c. This list contains every unique value in a.

To wit:

def helper(a):
    """
    Prints out unique values in the 
    2D list named `a`
    """
    b = reduce(lambda x, y: x + y, a)
    c = list( set(b) )
    for item in c:
        print(item)

If you wish to implement this using algorithms instead of using builtins and libraries, then this method ought to do:

def helper(a):
     # 1. Linearize the 2D list
     b = []
     for item in a:
         b = b + item

     # 2. Print unique values in 1D list
     c = []
     for item in b:
         if item in c:
             continue
         else:
             c.append(item)
             print(item)

I am hoping that this is self-explanatory. If it is confusing, kindly comment on what you find difficult in this answer.

Upvotes: 0

0x0fba
0x0fba

Reputation: 1620

Concise answer thanks to a list comprehension that allows to easily create a list.

word for word in a[0] is quite explicit, it loops over the word of the first row.

if word in a[1] retains only words that belong to the 2nd row.

duplicates = [word for word in a[0] if word in a[1]]
print(duplicates)  # ['very', 'happy']

The two in keywords have nothing to do. The 1nd in is involved in the foreach loop construct. The 2nd in is a membership operator.

Upvotes: 1

Tr3ate
Tr3ate

Reputation: 163

a =[["i", "will", "always", "be", "very", "happy"],["happy","people", "are", "cool", "very"]]
for i in range(len(a)-1):
    res = set(a[i]) & set(a[i+1])
    print(res)

Using sets you are able to acheive this with only one loop

Upvotes: 1

JohnyCapo
JohnyCapo

Reputation: 302

Your for is kind of complicated. I would solve it like this:

same_words = list()

for scanning_list in a:
   for scanned_list in a:
      if scanning_list == scanned_list:
         continue
      for scanning_item in scanning_list:
         if scanning_item in scanned_list and scanning_item not in same_words:
            print(scanning_item)
            same_words.append(scanning_item)

Upvotes: -1

Related Questions