Steven
Steven

Reputation: 15258

Numbers used as string in Pandas index

I have the following file:

Contract, FG
9896342,Y
11037874,Y
6912529,Y
9896652,N
363291,Y
7348524,Y
6078482,Y
7795457,N
2486242,Y
3297980,Y
9760560,Y
1200533,N
11033963,N
7861603,Y
8218268,Y
9760247,Y

I would like to create from this file an pandas DF and to use the column Contract as a string or unicode index column. It looks like number, but technically, it is a string.

I did this: DF = pd.read_csv('C:\\Users\\S.Benet\\Desktop\\test.txt', index_col='Contract', dtype=object, encoding = 'utf-8')

But the index is interpreted as INT.

>>DF.index
Int64Index([ 9896342, 11037874,  6912529,  9896652,   363291,  7348524,
             6078482,  7795457,  2486242,  3297980,  9760560,  1200533,
            11033963,  7861603,  8218268,  9760247],
           dtype='int64', name=u'Contract')

How can I force it to be a string index?

Upvotes: 0

Views: 44

Answers (1)

unutbu
unutbu

Reputation: 879471

If you use set_index instead of index_col, then the index will contain strings:

df = pd.read_csv('data', dtype=object, encoding='utf-8')
df = df.set_index('Contract')

or, equivalently,

df = pd.read_csv('data', dtype=object, encoding='utf-8').set_index('Contract')

In [154]: df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 16 entries, 9896342 to 9760247   # <-- a generic Index, not a Int64Index
Data columns (total 1 columns):
 FG    16 non-null object
dtypes: object(1)
memory usage: 256.0+ bytes

In [155]: df.index[0]
Out[155]: '9896342'

In [156]: type(df.index[0])
Out[156]: str

Upvotes: 1

Related Questions