Debi
Debi

Reputation: 16

Remove Duplicate content from file using python based on header

i am new to python i have to remove duplicate data based on some condition that is if symb variable is match with same symb variable then remove the duplicate symb variable and associate data like( dtype, iotype , bias , slpe , unit ,
max , min , client_no , path , ends) . here i am attach a file.

fil link below

Download the file from here.

Upvotes: -1

Views: 169

Answers (1)

Huzefa Sadikot
Huzefa Sadikot

Reputation: 581

You could use pandas.DataFrame.drop_duplicates method to remove duplicates from the file

import pandas as pd     # Import Pandas DF
input = "sample.txt"    # Input File
output = sample_original.txt    # output file
df = pd.read_csv(input, sep="\t or ,")    # Copy the files into the Dataframe \t is used for tab seperation , is used for comma seperation depends on how the text is spaced 
df.drop_duplicates(subset=None, inplace=True)  # Drop the duplicates Using inplace = True modifies original file also if you do not want to do this drop the inplace option
df.to_csv(file_name_output, index=False) # Export the content without duplicates into a txt file.

I assume your file is a txt as I was able to import it into notepad

Here is the link to documentation of Pandas

pandas.DataFrame.drop_duplicates

Upvotes: 1

Related Questions