Bahlsen
Bahlsen

Reputation: 177

change dtype pandas by column number for multiple columns

I would like to change the dtype of a dataframe which I am going to read in using python pandas. I know that I can change the dtype by the column name like this:

    df = pd.read_csv("blablab.csv", dtype = {"Age":int}

However, I would like to set the dtype by the column number. E.g. column 1,3,5 to "datetime" and the dtype of column 6 until the last column to dtype "float". Is there anything like:

    df = pd.read_csv("blablab.csv", dtype = {1,3,5: datetime64, 6-end: float64}

Thank you very much, your help is greatly appreciated!

Upvotes: 1

Views: 1308

Answers (2)

Elton Clark
Elton Clark

Reputation: 156

I would recommend building the dtype variable ahead of the import by importing one row for you to make a default dict comprehension of a default type and then modify the columns to special types. I pulled in StringIO just for running a test case below.

import pandas as pd
import numpy as np
from io import StringIO

dummyCSV = """header 1,header 2,header 3
1,2,3
4,5,6
7,8,9
11,12,13
14,15,16"""

blabblab_csv = StringIO(dummyCSV, newline='\n')
limitedRead = pd.read_csv(blabblab_csv, sep=",", nrows = 1)

#set a default type and populate all column types
defaultType = np.float64
dTypes = {key: defaultType for key in list(limitedRead.columns)}
#then override the columns you want, using the integer position
dTypes[limitedRead.columns[1]] = np.int32

blabblab_csv = StringIO(dummyCSV, newline='\n') #reset virtual file
fullRead = pd.read_csv(blabblab_csv, sep=",", dtype = dTypes)

I know its probably a little late for you, but I just had to do this for a project I'm working on so hopefully next search that hits this topic there will be an answer waiting for them.

Upvotes: 2

ExplodingGayFish
ExplodingGayFish

Reputation: 2897

One way is to change the type after creating the DataFrame like this:

import pandas as pd
df = pd.DataFrame({'a': ['a', 'b', 'c'], 'b': ['c', 'd', 'e'],
                   'c' : ['1','2','3'],'d' : ['4','5','6']})
df[df.columns[2:]] = df[df.columns[2:]].astype(float)
df['c']

Output:

0    1.0
1    2.0
2    3.0
Name: c, dtype: float64

Here I change the last 2 columns's type to float

Upvotes: 0

Related Questions