Reputation: 77
I am new to programming and I have written a program that reads and modifies a large excel file using Python Pandas. In the code I have the following line:
df1 = df1.apply(lambda x : pd.to_numeric(x,errors='ignore'))
Which does what I need it to, but it also turns the data below my header into floats. Is there a way to have them turn to and int type instead?
df1 is a dataframe and I am attempting to create a nested dictionary with its contents.
Upvotes: 1
Views: 122
Reputation: 164843
Option 2
Use this for a list of numeric columns in an existing dataframe:
cols = ['col1', 'col2', 'col3']
df1[cols] = df1[cols].apply(pd.to_numeric, errors='ignore', downcast='integer')
The standard astype(int)
is sub-optimal since it doesn't downcast by default.
Option 1
As @AntonvBR mentions, ideally you want to read in series as downcasted integers, if at all possible. Then this separate conversion would not be necessary.
For example, the dtype
parameter of pd.read_excel
takes a dictionary input:
df = pd.read_excel('file.xlsx', dtype={'Col1': np.int8})
This will only work if you know your columns in advance.
Upvotes: 4