jroc
jroc

Reputation: 91

Text Mining: Query search

I have a dictionary:

{'Farage': [0, 5, 9, 192,233,341],
 'EU': [0, 1, 5, 6, 9, 23]}

Query1: “Farage” and “EU”
Query2: “Farage” or “EU”

I need to return the documents that contain these queries. For query1, for example, the answer should be [0,5,9]. I believe the answer should be something like that but in python:

final_list = []
while x≠Null and y≠Null
    do if docID(x)=docID(y)
       then ADD(final_list, docID(x))
          x← next(x)
          y ←next(y)
        else if docID(x) < docID(y)
          then x← next(x)
          else y ←next(y)
return final_list

Please help.

Upvotes: 1

Views: 86

Answers (3)

mad_
mad_

Reputation: 8273

You can create a dict of operators and throw set operations to get the final results. It assumes that queries follow strict rule of key1 operator key2 operator key3

For arbitrary number of arguments

import operator
d1={'Farage': [0, 5, 9, 192,233,341],
    'EU': [0, 1, 5, 6, 9, 23],
    'hopeless': [0, 341, 19999]}

d={'and':operator.and_,
  'or':operator.or_}

Queries= ['Farage and EU','Farage and EU or hopeless','Farage or EU']

for query in Queries:
    res=set()
    temp_arr = query.split()
    k1 = temp_arr[0]

    for value in range(1,len(temp_arr),2):
        op = temp_arr[value]
        k2 = temp_arr[value+1]
        if res:
            res = d[op](res, set(d1.get(k2, [])))
        else:
            res = d[op](set(d1.get(k1, [])), set(d1.get(k2, [])))
    print(res)

Output

set([0, 9, 5])
set([0, 192, 5, 233, 9, 19999, 341])
set([0, 192, 5, 6, 1, 233, 23, 341, 9])

Upvotes: 1

s3n0
s3n0

Reputation: 636

Bare in mind, use the conversion into sets:

>>> d = {'Farage': [0, 5, 9, 192, 233, 341] , 'EU': [0, 1, 5, 6, 9, 23]}
>>> d
{'EU': [0, 1, 5, 6, 9, 23], 'Farage': [0, 5, 9, 192, 233, 341]}
>>>
>>> set(d['EU']) | set(d['Farage'])
{0, 1, 192, 5, 6, 9, 233, 341, 23}
>>>
>>> set(d['EU']) & set(d['Farage'])
{0, 9, 5}
>>>
>>> set(d['EU']) ^ set(d['Farage'])
{192, 1, 23, 233, 341, 6}
>>>
>>> set(d['EU']) - set(d['Farage'])
{1, 6, 23}

Or change the format of the input if it is possible for the dictionary to be directly in the form of the set, that is:

>>> d = {'Farage': {0, 5, 9, 192, 233, 341}, 'EU': {0, 1, 5, 6, 9, 23}}
>>> d['EU'] & d['Farage']
{0, 9, 5}

Upvotes: 0

Vasilis G.
Vasilis G.

Reputation: 7844

You could create your own function using sets, a structure that Python provides and works best for your case by speeding up the process of joining and intersecting sequences of elements:

def getResults(s, argument):
    s = list(s.values())
    if argument == 'OR':
        result = s[0]
        for elem in s[1:]:
            result = sorted(set(result).union(set(elem)))
        return result
    elif argument == 'AND':
        result = s[0]
        for elem in s[1:]:
            result = sorted(set(result).intersection(set(elem)))
        return result
    else:
        return None

inDict = {'Farage': [0, 5, 9, 192,233,341], 'EU': [0, 1, 5, 6, 9, 23]}

query1 = getResults(inDict, 'AND')
query2 = getResults(inDict, 'OR')

print(query1)
print(query2)

Results:

[0, 5, 9]
[0, 1, 5, 6, 9, 23, 192, 233, 341]

Note: You can remove the sorted function if you do not want any sorting.

Upvotes: 1

Related Questions