Anthony Ter
Anthony Ter

Reputation: 143

Reading CSV creates too many rows/columns

I am working with dataframes in the pandas library. I have a table of data in Excel that I save as a CSV then I call

df = pd.read_csv("file.csv")

I expect the frame to look something like

   Item1  Item2  Item3
0  12.00      3      2
1   4.00      8      4
2   3.14      2      8

But instead I get

   Item1  Item2  Item3 Unnamed: 3 Unnamed: 4
0  12.00      3      2        NaN        NaN
1   4.00      8      4        NaN        NaN
2   3.14      2      8        NaN        NaN

Or sometimes extra rows with all NaN values. It appears that pandas is not aware of the real size of the CSV. The data in Excel is organized perfectly fine, the data values are all nonempty and are entirely in a rectangle. How do I fix this? Is there an edit I can make to the CSV that will specify the its correct size?

As requested here is a snippet of the data. It goes down to about 2500 rows, and there are no more values to the right.

Upvotes: 3

Views: 2287

Answers (1)

villoro
villoro

Reputation: 1549

You probably have a cell that is not empty (for example a space ) in the original Excel file. If you are getting 2 unnamed columns in pandas, try to delete 2 columns of the original Excel file.

Another way would be to keep all columns that are not unnamed. You could do this with:

real_cols = [x for x in df.columns if not x.startswith("Unnamed: ")]
df = df[real_cols]

And then you can save the csv.

Upvotes: 1

Related Questions