jes516
jes516

Reputation: 552

python find index of the first duplicate within a list

i want to iterate through a list so i can find the index number where the first item in the list finds its first match. my results should print mylist[0:first_match]

here is what i mean:

.APT 5B              APT 5B  .
.BUSINESS   JOEY     BUSINESS.
.                    1ST FL  .
.        NATE JR    SAM      .
.        JOE       7         .
.                            .
.2ND FLR TOM         2ND FLR .
.A1 2FL           APT 71E    .
.APT E205            APT 1R  .
.        CONSTRUCTION        .
.APT 640              APT 545.
.PART1   SYNC  PART2         .
.  NATE JR        SAM        .

the problem im running into is the program keeps adding items to dictionary even after the first match is found therefore appending data that i want to ignore/bypass..

here is what i have:

dictt = {}
with open(path + 'sample33.txt', 'rb') as txtin:
        for line in txtin:
            part2 = line[1:29].split()
            uniq = []
            print '%r' % part2

            for key in part2:
                if key not in dictt:
                    dictt[key] = key
                    uniq.append(key)
            dictt = {}
            print ' '.join(uniq)

Results:

['APT', '5B', 'APT', '5B']
APT 5B
['BUSINESS', 'JOEY', 'BUSINESS']
BUSINESS JOEY
['1ST', 'FL']
1ST FL
['NATE', 'JR', 'SAM']
NATE JR SAM
['JOE', '7']
JOE 7
[]

['2ND', 'FLR', 'TOM', '2ND', 'FLR']
2ND FLR TOM
['A1', '2FL', 'APT', '71E']
A1 2FL APT 71E
['APT', 'E205', 'APT', '1R']
APT E205 1R          # Would like to stop adding items after first 'APT' match
['CONSTRUCTION']
CONSTRUCTION
['APT', '640', 'APT', '545']
APT 640 545          # same here...
['PART1', 'SYNC', 'PART2']
PART1 SYNC PART2
['NATE', 'JR', 'SAM']
NATE JR SAM
[Finished in 0.1s]

i hope i have explained this correctly and someone can fine tune it

thank you

Edit #1 here is an example of what i would like to print:

listt:
    ['APT', '640', 'APT', '1', '2', '3']

found 'APT' match so:

print:
    APT 640

ignore ...'APT', '1', '2', '3']

Upvotes: 0

Views: 781

Answers (3)

leshurex
leshurex

Reputation: 23

I'm not sure I completely understand what you need, but this can be useful.

def read_text(name_file, string):

    index_found = [0, 0]
    result = [0, 0]

    with open (name_file) as f:
        read_temp = [word for line in f for word in line.split()]       

    for s in read_temp:                                                 
        if string in str(s):
            index_str = read_temp.index(s)                              
            index_found[0] = index_str
            index_found[1] = index_str + 1                              

    result[0] = read_temp[index_found[0]]
    result[1] = read_temp[index_found[1]]

    return result

os.chdir('Path to your .txt')

result_list = read_text("your_file.txt", "APT") # "APT" or whatever string you need to find.

print result_list

Output:

['APT', '5B']

Upvotes: 0

Hackaholic
Hackaholic

Reputation: 19763

here you go:

>>> f = open('your_file.txt')
>>> for x in f:
        line = re.findall('\w+',x.strip())
        print line
        try:
            print " " .join(line[:line[1:].index(line[0])+1])
        except: print " ".join(line)

output:

['APT', '5B', 'APT', '5B']
APT 5B
['BUSINESS', 'JOEY', 'BUSINESS']
BUSINESS JOEY
['1ST', 'FL']
1ST FL
['NATE', 'JR', 'SAM']
NATE JR SAM
['JOE', '7']
JOE 7
[]

['2ND', 'FLR', 'TOM', '2ND', 'FLR']
2ND FLR TOM
['A1', '2FL', 'APT', '71E']
A1 2FL APT 71E
['APT', 'E205', 'APT', '1R']
APT E205                  # not printing after match
['CONSTRUCTION']
CONSTRUCTION
['APT', '640', 'APT', '545']
APT 640                   # not printing after match
['PART1', 'SYNC', 'PART2']
PART1 SYNC PART2
['NATE', 'JR', 'SAM']
NATE JR SAM

Upvotes: 1

Carlos
Carlos

Reputation: 1935

If your concern is about removing duplicate entries from your list then "set" is there to rescue you.

uniqlist = list(set(dupelist))

I should also mention there is another article that references the ability to remove duplicates from a list.

Python unique list using set

Upvotes: 0

Related Questions