Rao Sunny
Rao Sunny

Reputation: 99

looking for a function that help me to avoid duplication into the text file

I am facing a minor issue but didn't get success to solve this issue yet.

I have a text file in which a couple of words is duplicate, but I don't want to enter duplicate words.

here is the text file data,

<inputting a country>
ENU D:\ART-Project\build-python\testing\audio_category\1_2021_01_28_14_46_57_inputting a country.wav    navigation_destination_country

<talk to me about navigation>
ENU D:\ART-Project\build-python\testing\audio_category\2_2021_01_28_14_46_57_talk to me about navigation.wav    system_navigation_sdsmenu

<enter POI please>
ENU D:\ART-Project\build-python\testing\audio_category\3_2021_01_28_14_46_57_enter POI please.wav   navigation_destination_poi

<bring me to a charging station please>
ENU D:\ART-Project\build-python\testing\audio_category\4_2021_01_28_14_46_57_bring me to a charging station please.wav  navigation_destination_poi

<Search nearest charging station at destination>
ENU D:\ART-Project\build-python\testing\audio_category\5_2021_01_28_14_46_57_Search nearest charging station at destination.wav navigation_destination_poi

<Search charging station along the route>
ENU D:\ART-Project\build-python\testing\audio_category\6_2021_01_28_14_46_57_Search charging station along the route.wav    navigation_destination_poi

<Search charging station>
ENU D:\ART-Project\build-python\testing\audio_category\7_2021_01_28_14_46_57_Search charging station.wav    navigation_destination_poi

<please show me my last destinations>
ENU D:\ART-Project\build-python\testing\audio_category\8_2021_01_28_14_46_57_please show me my last destinations.wav    navigation_last_destinations

<please turn on the navigation voice guidance>
ENU D:\ART-Project\build-python\testing\audio_category\9_2021_01_28_14_46_57_please turn on the navigation voice guidance.wav   system_navigation_sdsmenu

<United Kingdom>
ENU D:\ART-Project\build-python\testing\audio_category\10_2021_01_28_14_46_57_United Kingdom.wav    navigation_destination_country

<charging station>
ENU D:\ART-Project\build-python\testing\audio_category\11_2021_01_28_14_46_57_charging station.wav  navigation_destination_poi_slot_only

<line 5>
ENU D:\ART-Project\build-python\testing\audio_category\12_2021_01_28_14_46_57_line 5.wav    system_line_number

<line 4>
ENU D:\ART-Project\build-python\testing\audio_category\13_2021_01_28_14_46_57_line 4.wav    system_line_number

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\14_2021_01_28_14_46_57_line 2.wav    system_line_number

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\15_2021_01_28_14_46_57_line 2.wav    system_line_number

<london Court road Tottenham 9>
ENU D:\ART-Project\build-python\testing\audio_category\16_2021_01_28_14_46_57_london Court road Tottenham 9.wav navigation_destination_address

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\17_2021_01_28_14_46_57_line 2.wav    system_line_number

<line 1>
ENU D:\ART-Project\build-python\testing\audio_category\18_2021_01_28_14_46_57_line 1.wav    system_line_number

In the above example <line 2> is the duplicate word, but I want to avoid <line 2> and also it is below the line "some sentences..."

the output looks like,

<inputting a country>
ENU D:\ART-Project\build-python\testing\audio_category\1_2021_01_28_14_46_57_inputting a country.wav    navigation_destination_country

<talk to me about navigation>
ENU D:\ART-Project\build-python\testing\audio_category\2_2021_01_28_14_46_57_talk to me about navigation.wav    system_navigation_sdsmenu

<enter POI please>
ENU D:\ART-Project\build-python\testing\audio_category\3_2021_01_28_14_46_57_enter POI please.wav   navigation_destination_poi

<bring me to a charging station please>
ENU D:\ART-Project\build-python\testing\audio_category\4_2021_01_28_14_46_57_bring me to a charging station please.wav  navigation_destination_poi

<Search nearest charging station at destination>
ENU D:\ART-Project\build-python\testing\audio_category\5_2021_01_28_14_46_57_Search nearest charging station at destination.wav navigation_destination_poi

<Search charging station along the route>
ENU D:\ART-Project\build-python\testing\audio_category\6_2021_01_28_14_46_57_Search charging station along the route.wav    navigation_destination_poi

<Search charging station>
ENU D:\ART-Project\build-python\testing\audio_category\7_2021_01_28_14_46_57_Search charging station.wav    navigation_destination_poi

<please show me my last destinations>
ENU D:\ART-Project\build-python\testing\audio_category\8_2021_01_28_14_46_57_please show me my last destinations.wav    navigation_last_destinations

<please turn on the navigation voice guidance>
ENU D:\ART-Project\build-python\testing\audio_category\9_2021_01_28_14_46_57_please turn on the navigation voice guidance.wav   system_navigation_sdsmenu

<United Kingdom>
ENU D:\ART-Project\build-python\testing\audio_category\10_2021_01_28_14_46_57_United Kingdom.wav    navigation_destination_country

<charging station>
ENU D:\ART-Project\build-python\testing\audio_category\11_2021_01_28_14_46_57_charging station.wav  navigation_destination_poi_slot_only

<line 5>
ENU D:\ART-Project\build-python\testing\audio_category\12_2021_01_28_14_46_57_line 5.wav    system_line_number

<line 4>
ENU D:\ART-Project\build-python\testing\audio_category\13_2021_01_28_14_46_57_line 4.wav    system_line_number

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\15_2021_01_28_14_46_57_line 2.wav    system_line_number

<london Court road Tottenham 9>
ENU D:\ART-Project\build-python\testing\audio_category\16_2021_01_28_14_46_57_london Court road Tottenham 9.wav navigation_destination_address

<line 1>
ENU D:\ART-Project\build-python\testing\audio_category\18_2021_01_28_14_46_57_line 1.wav    system_line_number

that's mean <line 2> appear one time only.

Upvotes: 1

Views: 59

Answers (3)

Ahmed
Ahmed

Reputation: 924

Let's add a general extractor that will help to extract the duplicates from you file, generally using python Regexes

Code Syntax

import re

def extractor(path):
    list = []
    with open(path) as file:
        lis = file.readlines()
        for index, line in enumerate(lis):
            lin = re.search(r"<\w+\s\d>", line.strip('\n'))
            try:
                if lin is None:
                    list.append(line.strip('\n'))
                else:
                    if lin.group(0) not in list:
                        list.append(lin.group(0))
                        list.append(lis[index+1].strip('\n'))
                        lis.pop(1) #prevent to append the same line again of the tag.
                    else:
                        lis.pop(0) #prevent to append the directory line of the duplicate tag.
                        
            except IndexError:
                break
        return list


## ----- main Execution ----- ##
for line in extractor('read_text_extraction3.txt'):
    print(line)

Output

<inputting a country>
ENU D:\ART-Project\build-python\testing\audio_category\1_2021_01_28_14_46_57_inputting a country.wav    navigation_destination_country

<talk to me about navigation>
ENU D:\ART-Project\build-python\testing\audio_category\2_2021_01_28_14_46_57_talk to me about navigation.wav    system_navigation_sdsmenu

<enter POI please>
ENU D:\ART-Project\build-python\testing\audio_category\3_2021_01_28_14_46_57_enter POI please.wav   navigation_destination_poi

<bring me to a charging station please>
ENU D:\ART-Project\build-python\testing\audio_category\4_2021_01_28_14_46_57_bring me to a charging station please.wav  navigation_destination_poi

<Search nearest charging station at destination>
ENU D:\ART-Project\build-python\testing\audio_category\5_2021_01_28_14_46_57_Search nearest charging station at destination.wav navigation_destination_poi

<Search charging station along the route>
ENU D:\ART-Project\build-python\testing\audio_category\6_2021_01_28_14_46_57_Search charging station along the route.wav    navigation_destination_poi

<Search charging station>
ENU D:\ART-Project\build-python\testing\audio_category\7_2021_01_28_14_46_57_Search charging station.wav    navigation_destination_poi

<please show me my last destinations>
ENU D:\ART-Project\build-python\testing\audio_category\8_2021_01_28_14_46_57_please show me my last destinations.wav    navigation_last_destinations

<please turn on the navigation voice guidance>
ENU D:\ART-Project\build-python\testing\audio_category\9_2021_01_28_14_46_57_please turn on the navigation voice guidance.wav   system_navigation_sdsmenu

<United Kingdom>
ENU D:\ART-Project\build-python\testing\audio_category\10_2021_01_28_14_46_57_United Kingdom.wav    navigation_destination_country

<charging station>
ENU D:\ART-Project\build-python\testing\audio_category\11_2021_01_28_14_46_57_charging station.wav  navigation_destination_poi_slot_only

<line 5>
ENU D:\ART-Project\build-python\testing\audio_category\12_2021_01_28_14_46_57_line 5.wav    system_line_number

<line 4>
ENU D:\ART-Project\build-python\testing\audio_category\13_2021_01_28_14_46_57_line 4.wav    system_line_number

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\14_2021_01_28_14_46_57_line 2.wav    system_line_number


<london Court road Tottenham 9>
ENU D:\ART-Project\build-python\testing\audio_category\16_2021_01_28_14_46_57_london Court road Tottenham 9.wav navigation_destination_address


<line 1>
ENU D:\ART-Project\build-python\testing\audio_category\18_2021_01_28_14_46_57_line 1.wav    system_line_number


[Program finished]

Upvotes: 1

pakpe
pakpe

Reputation: 5479

You read the lines into a list, then iterate over the list in steps of 2 and add values to the unique list if not already present in that list.

with open('scores.txt') as input:
    lines = [line.strip() for line in input]
unique = []
for i in range(0,len(lines),3):
    if lines[i] not in unique:
        unique.append(lines[i])
        unique.append(lines[i+1])

print(unique)

Upvotes: 1

Aven Desta
Aven Desta

Reputation: 2443

You can read the file two lines at a time and store the values in a list. Then remove the duplicates from the list, and finally write the new list to a file.

f = open('file.txt','r') # replace file.txt with your text file name
line_list = []
while True:
  line1 = f.readline()
  line2 = f.readline()
  line_list.append(line1+line2)
  if not line2: break
new_list = list(dict.fromkeys(line_list)) # removes duplicates from line_list
print("".join(new_list)) 
# Here you need to write new_list into another file

OUTPUT

<line 4>
some sentences... 
<line 2>
some sentences...
<line 1>
some sentences...
<line 3>
some sentences...
<line 5>
some sentences...
<line 7>
some sentences...

Upvotes: 1

Related Questions