Collective Action
Collective Action

Reputation: 8009

trouble setting column dtypes pandas python

I am trying to set the dtype on two of my columns, but it is not working. I want to set [trans_typ] to 'category' and [date] to date.time. There is also an index [date] that I have already set to date.time but I want to set the first column to date.time as well.

import numpy as np
import pandas as pd
import glob

df = pd.read_csv('/home/jayaramdas/anaconda3/cf_data', low_memory=False, \
                 parse_dates = True) 
df.set_index(pd.to_datetime(df['date']), inplace=True)                 


df['trans_typ'].astype('category')

pd.to_datetime(df['date'])
df.dtypes

My output

date          object
cmte_id       object
trans_typ     object
amount       float64
fec_id        object
cand_id       object
dtype: object

This is my data output from a print (df)

             date    cmte_id trans_typ  amount     fec_id    cand_id
date                                                                     
2007-08-15  2007-08-15  C00112250       24K    2000  C00431569  P00003392
2007-09-26  2007-09-26  C00119040       24K    1000  C00367680  H2FL05127
2007-09-26  2007-09-26  C00119040       24K    1000  C00140715  H2MD05155
2007-07-20  2007-07-20  C00346296       24K    1000  C00434571  H8CA37137

Upvotes: 1

Views: 61

Answers (2)

jezrael
jezrael

Reputation: 863451

You can use:

#if you need copy of column date to index
df.set_index(df['date'], inplace=True) 
print df
                 date    cmte_id trans_typ entity_typ state  employer  \
date                                                                    
2007-08-15 2007-08-15  C00112250       24K        ORG    DC       NaN   
2007-09-26 2007-09-26  C00119040       24K        CCM    FL       NaN   
2007-09-26 2007-09-26  C00119040       24K        CCM    MD       NaN   
2011-02-25 2011-02-25  C00478404       24K        COM    MN       NaN   
2011-02-01 2011-02-01  C00140855       24K        CCM    DC       NaN   
2011-02-01 2011-02-01  C00140855       24K        CCM    DC       NaN   
2011-02-22 2011-02-22  C00140855       24K        CCM    MD       NaN   
2011-02-28 2011-02-28  C00093963       24K        CCM    ND       NaN   

            occupation  amount     fec_id    cand_id  
date                                                  
2007-08-15         NaN    2000  C00431569  P00003392  
2007-09-26         NaN    1000  C00367680  H2FL05127  
2007-09-26         NaN    1000  C00140715  H2MD05155  
2011-02-25         NaN    2400  C00326629  H8MN06047  
2011-02-01         NaN    1000  C00373464  H2OH17109  
2011-02-01         NaN    1000  C00289983  H4KY01040  
2011-02-22         NaN    2500  C00140715  H2MD05155  
2011-02-28         NaN    1000  C00474619  H0ND00135 


#convert column trans_typ to category
#column date is datetime, no converted
df['trans_typ'] = df['trans_typ'].astype('category')
print df
                 date    cmte_id trans_typ entity_typ state  employer  \
date                                                                    
2007-08-15 2007-08-15  C00112250       24K        ORG    DC       NaN   
2007-09-26 2007-09-26  C00119040       24K        CCM    FL       NaN   
2007-09-26 2007-09-26  C00119040       24K        CCM    MD       NaN   
2011-02-25 2011-02-25  C00478404       24K        COM    MN       NaN   
2011-02-01 2011-02-01  C00140855       24K        CCM    DC       NaN   
2011-02-01 2011-02-01  C00140855       24K        CCM    DC       NaN   
2011-02-22 2011-02-22  C00140855       24K        CCM    MD       NaN   
2011-02-28 2011-02-28  C00093963       24K        CCM    ND       NaN   

            occupation  amount     fec_id    cand_id  
date                                                  
2007-08-15         NaN    2000  C00431569  P00003392  
2007-09-26         NaN    1000  C00367680  H2FL05127  
2007-09-26         NaN    1000  C00140715  H2MD05155  
2011-02-25         NaN    2400  C00326629  H8MN06047  
2011-02-01         NaN    1000  C00373464  H2OH17109  
2011-02-01         NaN    1000  C00289983  H4KY01040  
2011-02-22         NaN    2500  C00140715  H2MD05155  
2011-02-28         NaN    1000  C00474619  H0ND00135
print df.dtypes
date          datetime64[ns]
cmte_id               object
trans_typ           category
entity_typ            object
state                 object
employer             float64
occupation           float64
amount                 int64
fec_id                object
cand_id               object
dtype: object

Or:

#if you DONT need copy of column date to index
df.set_index('date', inplace=True) 
print df
              cmte_id trans_typ entity_typ state  employer  occupation  \
date                                                                     
2007-08-15  C00112250       24K        ORG    DC       NaN         NaN   
2007-09-26  C00119040       24K        CCM    FL       NaN         NaN   
2007-09-26  C00119040       24K        CCM    MD       NaN         NaN   
2011-02-25  C00478404       24K        COM    MN       NaN         NaN   
2011-02-01  C00140855       24K        CCM    DC       NaN         NaN   
2011-02-01  C00140855       24K        CCM    DC       NaN         NaN   
2011-02-22  C00140855       24K        CCM    MD       NaN         NaN   
2011-02-28  C00093963       24K        CCM    ND       NaN         NaN   

            amount     fec_id    cand_id  
date                                      
2007-08-15    2000  C00431569  P00003392  
2007-09-26    1000  C00367680  H2FL05127  
2007-09-26    1000  C00140715  H2MD05155  
2011-02-25    2400  C00326629  H8MN06047  
2011-02-01    1000  C00373464  H2OH17109  
2011-02-01    1000  C00289983  H4KY01040  
2011-02-22    2500  C00140715  H2MD05155  
2011-02-28    1000  C00474619  H0ND00135 
df['trans_typ'] = df['trans_typ'].astype('category')
print df
              cmte_id trans_typ entity_typ state  employer  occupation  \
date                                                                     
2007-08-15  C00112250       24K        ORG    DC       NaN         NaN   
2007-09-26  C00119040       24K        CCM    FL       NaN         NaN   
2007-09-26  C00119040       24K        CCM    MD       NaN         NaN   
2011-02-25  C00478404       24K        COM    MN       NaN         NaN   
2011-02-01  C00140855       24K        CCM    DC       NaN         NaN   
2011-02-01  C00140855       24K        CCM    DC       NaN         NaN   
2011-02-22  C00140855       24K        CCM    MD       NaN         NaN   
2011-02-28  C00093963       24K        CCM    ND       NaN         NaN   

            amount     fec_id    cand_id  
date                                      
2007-08-15    2000  C00431569  P00003392  
2007-09-26    1000  C00367680  H2FL05127  
2007-09-26    1000  C00140715  H2MD05155  
2011-02-25    2400  C00326629  H8MN06047  
2011-02-01    1000  C00373464  H2OH17109  
2011-02-01    1000  C00289983  H4KY01040  
2011-02-22    2500  C00140715  H2MD05155  
2011-02-28    1000  C00474619  H0ND00135 

print df.dtypes
cmte_id         object
trans_typ     category
entity_typ      object
state           object
employer       float64
occupation     float64
amount           int64
fec_id          object
cand_id         object
dtype: object

print df.index
DatetimeIndex(['2007-08-15', '2007-09-26', '2007-09-26', '2011-02-25',
               '2011-02-01', '2011-02-01', '2011-02-22', '2011-02-28'],
              dtype='datetime64[ns]', name=u'date', freq=None)

Upvotes: 1

Collective Action
Collective Action

Reputation: 8009

I just used df['date'] = df['date'].astype('datetime64') and it works!

Upvotes: 0

Related Questions