vasili111
vasili111

Reputation: 6940

Problems with loading csv with pandas

My code:

raw_data = pd.read_csv("C:/my.csv")

After I ran it to file is loaded but I am getting:

C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3051: DtypeWarning: Columns (0,79,237,239,241,243,245,247,248,249,250,251,252,253,254,255,256,258,260,262,264) have mixed types. Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)

Questions:

  1. What exactly it means?
  2. How to fix it?

Sorry, I cannot share the data.

Upvotes: 1

Views: 510

Answers (4)

Gess123
Gess123

Reputation: 61

Pandas will read all data to memory. If your CSV is large, this may be a tough task.

chunks = []
 for chunk in pd.read_csv('desired_file...', chunksize = 1000):
     chunks.append(chunk)
 df = pd.concat(chunks, ignore_index = True)

This will read the CSV to memory in chunks instead of as bulk.

Upvotes: 1

Try to use the parameter dtype for pandas.read_csv

You can find here: Pandas.read_csv

In my CSV, I just transform all the columns in a string, and after the loading of the Dataset, i transform the columns I need in numbers using

DataFrame[Column] = pandas.to_numeric(DataFrame[Column], errors='coerce')

Upvotes: 0

FredrikHedman
FredrikHedman

Reputation: 1253

pd.read_csv has a number of parameters that will give you control over how to treat the different columns.

Without the data it is hard to be specific, so read up on what the the options dtype or converters can do.

See the pandas manual for more details.

A first try could be

raw_data = pd.read_csv("C:/my.csv", dtype=str)

This should allow you to read the data and figure out how to set the data type on the columns that really matter.

Upvotes: 1

venkatadileep
venkatadileep

Reputation: 183

Try these

raw_data = pd.read_csv("C:/my.csv",low_memory=False)

Upvotes: 2

Related Questions