ner
ner

Reputation: 711

Why did I get dtype 'object' on reading in my data frame?

I am new to Python, and I want to determine the type of each column in a data frame, I wrote the code below, but the results are not as expected, I only get 'object' for type.

This is my data frame (just the first 7 th column):

      IDINDUSANALYSE    IDINDUS IDINDUSEFFLUENT DATEANALYSE IDTYPEECHANTILLON   IDPRELEVEUR IDLABO  IDORIGINEVAL    CONFORME    CONFCALC    IDINDDOSS   CONFFORCE
  672   635 6740    10/01/13    2   1   3   1   1   1   531 0
  673   635 6740    11/01/13    2   1   3   1   1   1   531 0
  674   635 6740    14/01/13    2   1   3   1   1   1   531 0
  675   635 6740    15/01/13    2   1   3   1   1   1   531 0
  676   635 6740    16/01/13    2   1   3   1   1   1   531 0
  677   635 6740    18/01/13    2   1   3   1   1   1   531 0

This is my code:

import pandas as pd
import csv

with open("/home/***/Documents/Table3.csv") as f:
    r = csv.reader(f)

df = pd.DataFrame().from_records(r)
for index, row in df.iterrows():
    print(df.dtypes)   

As a result I get this :

0      object
1      object
2      object
3      object
4      object

Please tell we what I did wrong ?

Upvotes: 0

Views: 1312

Answers (3)

smci
smci

Reputation: 33940

Please show your actual CSV file. If all columns were stored as object, it seems like they were detected as string, probably because your CSV file quotes each field. But post your actual CSV file.

To read in quoted fields in pandas and convert them back to their type (numeric/categorical), do either of:

pd.read_csv(..., quoting = pd.QUOTE_ALL)
pd.read_csv(..., quoting = pd.QUOTE_NONNUMERIC)

and read the section 'quoting' in https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

But also it's a good practice to explicitly pass pd.read_csv(..., dtype={...} a dictionary telling it which type to use for each column name. e.g. {‘a’: np.float64, ‘b’: np.int32}

Upvotes: 0

dragonfire_007
dragonfire_007

Reputation: 165

Try this

import pandas as pd
df = pd.read_csv("/home/***/Documents/Table3.csv")
types = [df['{0}'.format(i)].dtype for i in df.columns]
print(types)

which results as

[dtype('float64'), dtype('O'), dtype('O')]

Considering your actual dataframe has 4 columns yet you got object as result 5 times, which was your first hint for you.

Upvotes: 1

David
David

Reputation: 1202

types = df.columns.to_series().groupby(df.dtypes).groups

Then print out types, and you would get all of the column types (grouped by type).

Also, you can open the .csv file directly to a data frame using: pd.read_csv(filepath)

If you want a specific column's type - df.column.dtype or df['column'].dtype

Upvotes: 1

Related Questions