Reputation: 16
i am new to python i have to remove duplicate data based on some condition that is if
symb variable is match with same symb variable then remove the duplicate symb variable and associate data like( dtype,
iotype ,
bias ,
slpe ,
unit ,
max ,
min ,
client_no ,
path ,
ends) . here i am attach a file.
fil link below
Upvotes: -1
Views: 169
Reputation: 581
You could use pandas.DataFrame.drop_duplicates method to remove duplicates from the file
import pandas as pd # Import Pandas DF
input = "sample.txt" # Input File
output = sample_original.txt # output file
df = pd.read_csv(input, sep="\t or ,") # Copy the files into the Dataframe \t is used for tab seperation , is used for comma seperation depends on how the text is spaced
df.drop_duplicates(subset=None, inplace=True) # Drop the duplicates Using inplace = True modifies original file also if you do not want to do this drop the inplace option
df.to_csv(file_name_output, index=False) # Export the content without duplicates into a txt file.
I assume your file is a txt as I was able to import it into notepad
Here is the link to documentation of Pandas
pandas.DataFrame.drop_duplicates
Upvotes: 1