Reputation: 2150
I am loading a csv file into a Pandas DataFrame. For each column, how do I specify what type of data it contains using the dtype
argument?
np.bool_
and pd.tslib.Timestamp
without luck.Code:
import pandas as pd
import numpy as np
df = pd.read_csv(<file-name>, dtype={'A': np.int64, 'B': np.float64})
Upvotes: 22
Views: 52531
Reputation: 7355
There are a lot of options for read_csv which will handle all the cases you mentioned. You might want to try dtype={'A': datetime.datetime}, but often you won't need dtypes as pandas can infer the types.
For dates, then you need to specify the parse_date options:
parse_dates : boolean, list of ints or names, list of lists, or dict
keep_date_col : boolean, default False
date_parser : function
In general for converting boolean values you will need to specify:
true_values : list Values to consider as True
false_values : list Values to consider as False
Which will transform any value in the list to the boolean true/false. For more general conversions you will most likely need
converters : dict. optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels
Though dense, check here for the full list: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html
Upvotes: 16