Reputation: 1
I'm trying to read a CSV as a dataframe, then sort by column and subsequently output the sorted dataframe into a new CSV. However, the problem is that my output CSV looks nothing like the sorted dataframe with data being moved to wrong columns etc etc. I suspect that the problem lies with the data as some columns are made up of long strings and might have special characters - this is because when I stripped out certain columns, the steps I took below does work. I have tried to export and reimport the dataframe in both dictionary and pickle format and it works perfectly.
First I read in a CSV file and then sort by a column (The csv files I used can be downloaded in the comment below (<100kb in size)
df = pd.read_csv("database.csv",encoding = "ISO-8859-1")
sorteddf = df.sort_values(by="All Comment Score")
This show how the dataframe looks after sorting (What I want)
Then I store my dataframe in a new CSV file and read that new CSV as a new dataframe:
sorteddf.to_csv("test.csv")
newdf = pd.read_csv("test.csv",encoding = "ISO-8859-1")
However, when I read the newly outputed CSV file as a new dataframe, the columns and the data appear to be a mess: This shows how the dataframe imported from the output CSV actually looks like
I would really appreciate it if someone could shed some light on my problem and point me in the right direction!
Upvotes: 0
Views: 2633
Reputation: 7903
You have decoding/encoding issues. Your encoding is not in "ISO" its in 'latin-1'. Its hard to fix this unless you figure out why you are reading in your data like this.
Upvotes: 0
Reputation: 881
Are you talking about the unnamed column?
Try using
sorteddf.to_csv('test.csv', index=False)
This tells pandas not to output the inbuilt index column (most of the time you don't care about this)
Upvotes: 1