Reputation: 99
I am facing a minor issue but didn't get success to solve this issue yet.
I have a text file in which a couple of words is duplicate, but I don't want to enter duplicate words.
here is the text file data,
<inputting a country>
ENU D:\ART-Project\build-python\testing\audio_category\1_2021_01_28_14_46_57_inputting a country.wav navigation_destination_country
<talk to me about navigation>
ENU D:\ART-Project\build-python\testing\audio_category\2_2021_01_28_14_46_57_talk to me about navigation.wav system_navigation_sdsmenu
<enter POI please>
ENU D:\ART-Project\build-python\testing\audio_category\3_2021_01_28_14_46_57_enter POI please.wav navigation_destination_poi
<bring me to a charging station please>
ENU D:\ART-Project\build-python\testing\audio_category\4_2021_01_28_14_46_57_bring me to a charging station please.wav navigation_destination_poi
<Search nearest charging station at destination>
ENU D:\ART-Project\build-python\testing\audio_category\5_2021_01_28_14_46_57_Search nearest charging station at destination.wav navigation_destination_poi
<Search charging station along the route>
ENU D:\ART-Project\build-python\testing\audio_category\6_2021_01_28_14_46_57_Search charging station along the route.wav navigation_destination_poi
<Search charging station>
ENU D:\ART-Project\build-python\testing\audio_category\7_2021_01_28_14_46_57_Search charging station.wav navigation_destination_poi
<please show me my last destinations>
ENU D:\ART-Project\build-python\testing\audio_category\8_2021_01_28_14_46_57_please show me my last destinations.wav navigation_last_destinations
<please turn on the navigation voice guidance>
ENU D:\ART-Project\build-python\testing\audio_category\9_2021_01_28_14_46_57_please turn on the navigation voice guidance.wav system_navigation_sdsmenu
<United Kingdom>
ENU D:\ART-Project\build-python\testing\audio_category\10_2021_01_28_14_46_57_United Kingdom.wav navigation_destination_country
<charging station>
ENU D:\ART-Project\build-python\testing\audio_category\11_2021_01_28_14_46_57_charging station.wav navigation_destination_poi_slot_only
<line 5>
ENU D:\ART-Project\build-python\testing\audio_category\12_2021_01_28_14_46_57_line 5.wav system_line_number
<line 4>
ENU D:\ART-Project\build-python\testing\audio_category\13_2021_01_28_14_46_57_line 4.wav system_line_number
<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\14_2021_01_28_14_46_57_line 2.wav system_line_number
<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\15_2021_01_28_14_46_57_line 2.wav system_line_number
<london Court road Tottenham 9>
ENU D:\ART-Project\build-python\testing\audio_category\16_2021_01_28_14_46_57_london Court road Tottenham 9.wav navigation_destination_address
<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\17_2021_01_28_14_46_57_line 2.wav system_line_number
<line 1>
ENU D:\ART-Project\build-python\testing\audio_category\18_2021_01_28_14_46_57_line 1.wav system_line_number
In the above example <line 2> is the duplicate word, but I want to avoid <line 2> and also it is below the line "some sentences..."
the output looks like,
<inputting a country>
ENU D:\ART-Project\build-python\testing\audio_category\1_2021_01_28_14_46_57_inputting a country.wav navigation_destination_country
<talk to me about navigation>
ENU D:\ART-Project\build-python\testing\audio_category\2_2021_01_28_14_46_57_talk to me about navigation.wav system_navigation_sdsmenu
<enter POI please>
ENU D:\ART-Project\build-python\testing\audio_category\3_2021_01_28_14_46_57_enter POI please.wav navigation_destination_poi
<bring me to a charging station please>
ENU D:\ART-Project\build-python\testing\audio_category\4_2021_01_28_14_46_57_bring me to a charging station please.wav navigation_destination_poi
<Search nearest charging station at destination>
ENU D:\ART-Project\build-python\testing\audio_category\5_2021_01_28_14_46_57_Search nearest charging station at destination.wav navigation_destination_poi
<Search charging station along the route>
ENU D:\ART-Project\build-python\testing\audio_category\6_2021_01_28_14_46_57_Search charging station along the route.wav navigation_destination_poi
<Search charging station>
ENU D:\ART-Project\build-python\testing\audio_category\7_2021_01_28_14_46_57_Search charging station.wav navigation_destination_poi
<please show me my last destinations>
ENU D:\ART-Project\build-python\testing\audio_category\8_2021_01_28_14_46_57_please show me my last destinations.wav navigation_last_destinations
<please turn on the navigation voice guidance>
ENU D:\ART-Project\build-python\testing\audio_category\9_2021_01_28_14_46_57_please turn on the navigation voice guidance.wav system_navigation_sdsmenu
<United Kingdom>
ENU D:\ART-Project\build-python\testing\audio_category\10_2021_01_28_14_46_57_United Kingdom.wav navigation_destination_country
<charging station>
ENU D:\ART-Project\build-python\testing\audio_category\11_2021_01_28_14_46_57_charging station.wav navigation_destination_poi_slot_only
<line 5>
ENU D:\ART-Project\build-python\testing\audio_category\12_2021_01_28_14_46_57_line 5.wav system_line_number
<line 4>
ENU D:\ART-Project\build-python\testing\audio_category\13_2021_01_28_14_46_57_line 4.wav system_line_number
<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\15_2021_01_28_14_46_57_line 2.wav system_line_number
<london Court road Tottenham 9>
ENU D:\ART-Project\build-python\testing\audio_category\16_2021_01_28_14_46_57_london Court road Tottenham 9.wav navigation_destination_address
<line 1>
ENU D:\ART-Project\build-python\testing\audio_category\18_2021_01_28_14_46_57_line 1.wav system_line_number
that's mean <line 2> appear one time only.
Upvotes: 1
Views: 59
Reputation: 924
Let's add a general extractor that will help to extract the duplicates from you file, generally using python Regexes
Code Syntax
import re
def extractor(path):
list = []
with open(path) as file:
lis = file.readlines()
for index, line in enumerate(lis):
lin = re.search(r"<\w+\s\d>", line.strip('\n'))
try:
if lin is None:
list.append(line.strip('\n'))
else:
if lin.group(0) not in list:
list.append(lin.group(0))
list.append(lis[index+1].strip('\n'))
lis.pop(1) #prevent to append the same line again of the tag.
else:
lis.pop(0) #prevent to append the directory line of the duplicate tag.
except IndexError:
break
return list
## ----- main Execution ----- ##
for line in extractor('read_text_extraction3.txt'):
print(line)
Output
<inputting a country>
ENU D:\ART-Project\build-python\testing\audio_category\1_2021_01_28_14_46_57_inputting a country.wav navigation_destination_country
<talk to me about navigation>
ENU D:\ART-Project\build-python\testing\audio_category\2_2021_01_28_14_46_57_talk to me about navigation.wav system_navigation_sdsmenu
<enter POI please>
ENU D:\ART-Project\build-python\testing\audio_category\3_2021_01_28_14_46_57_enter POI please.wav navigation_destination_poi
<bring me to a charging station please>
ENU D:\ART-Project\build-python\testing\audio_category\4_2021_01_28_14_46_57_bring me to a charging station please.wav navigation_destination_poi
<Search nearest charging station at destination>
ENU D:\ART-Project\build-python\testing\audio_category\5_2021_01_28_14_46_57_Search nearest charging station at destination.wav navigation_destination_poi
<Search charging station along the route>
ENU D:\ART-Project\build-python\testing\audio_category\6_2021_01_28_14_46_57_Search charging station along the route.wav navigation_destination_poi
<Search charging station>
ENU D:\ART-Project\build-python\testing\audio_category\7_2021_01_28_14_46_57_Search charging station.wav navigation_destination_poi
<please show me my last destinations>
ENU D:\ART-Project\build-python\testing\audio_category\8_2021_01_28_14_46_57_please show me my last destinations.wav navigation_last_destinations
<please turn on the navigation voice guidance>
ENU D:\ART-Project\build-python\testing\audio_category\9_2021_01_28_14_46_57_please turn on the navigation voice guidance.wav system_navigation_sdsmenu
<United Kingdom>
ENU D:\ART-Project\build-python\testing\audio_category\10_2021_01_28_14_46_57_United Kingdom.wav navigation_destination_country
<charging station>
ENU D:\ART-Project\build-python\testing\audio_category\11_2021_01_28_14_46_57_charging station.wav navigation_destination_poi_slot_only
<line 5>
ENU D:\ART-Project\build-python\testing\audio_category\12_2021_01_28_14_46_57_line 5.wav system_line_number
<line 4>
ENU D:\ART-Project\build-python\testing\audio_category\13_2021_01_28_14_46_57_line 4.wav system_line_number
<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\14_2021_01_28_14_46_57_line 2.wav system_line_number
<london Court road Tottenham 9>
ENU D:\ART-Project\build-python\testing\audio_category\16_2021_01_28_14_46_57_london Court road Tottenham 9.wav navigation_destination_address
<line 1>
ENU D:\ART-Project\build-python\testing\audio_category\18_2021_01_28_14_46_57_line 1.wav system_line_number
[Program finished]
Upvotes: 1
Reputation: 5479
You read the lines into a list, then iterate over the list in steps of 2 and add values to the unique list if not already present in that list.
with open('scores.txt') as input:
lines = [line.strip() for line in input]
unique = []
for i in range(0,len(lines),3):
if lines[i] not in unique:
unique.append(lines[i])
unique.append(lines[i+1])
print(unique)
Upvotes: 1
Reputation: 2443
You can read the file two lines at a time and store the values in a list. Then remove the duplicates from the list, and finally write the new list to a file.
f = open('file.txt','r') # replace file.txt with your text file name
line_list = []
while True:
line1 = f.readline()
line2 = f.readline()
line_list.append(line1+line2)
if not line2: break
new_list = list(dict.fromkeys(line_list)) # removes duplicates from line_list
print("".join(new_list))
# Here you need to write new_list into another file
OUTPUT
<line 4>
some sentences...
<line 2>
some sentences...
<line 1>
some sentences...
<line 3>
some sentences...
<line 5>
some sentences...
<line 7>
some sentences...
Upvotes: 1