Reputation: 2513
My code like below:
indexing_file_path = 'indexing.csv'
if not os.path.exists(indexing_file_path):
df = pd.DataFrame([['1111', '20200101', '20200101'],
['1112', '20200101', '20200101'],
['1113', '20200101', '20200101']],
columns = ['nname', 'nstart', 'nend'])
else:
df = pd.read_csv(indexing_file_path, header = 0)
print(df)
df.loc[len(df)] = ['1113', '20200202', '20200303']
# append() method not working either
print(df)
df.drop_duplicates('nname', keep = 'last', inplace = True)
print(df)
df.to_csv(indexing_file_path, index = False)
I want to keep the nname
column unique in this file.
When the code run first time, it will save the records to csv file correctly, although the 1113
is not unique.
When the code run second time, it will save two 1113
rows to the csv file, because the DataFrame is created from a csv file.
After the third time run, it will always keep two 1113
rows.
Now I have a solution:
1, save to csv file with two 1113
row.
2, read the csv file again.
3, use drop_duplicates
again.
4, save to csv file again.
Why the DataFrame created from a csv file is so different?
How can I save the unique row to csv file one time?
Upvotes: 0
Views: 431
Reputation: 2513
I can answer my question now.
The reason is:
When DataFrame is created from a csv file, pandas recognize the nname
column as integer
But, when I add 1113
row again, pandas recognize the new row nname
as a string, so the integer 1113
is not equals the string 1113
, pandas will keep two row.
The solution is:
Read csv file as string.
df = pd.read_csv(indexing_file_path, header=0, dtype=str)
Upvotes: 1