code base 5000
code base 5000

Reputation: 4102

Pandas Dataframe Issue Converting Column dtype

I have a simple pandas dataframe with a column:

col = [['A']]
data = [[1.0],[2.3],[3.4]]
df = pd.DataFrame.from_records(data, columns=col)

This creates a dataframe with one column of type np.float64, which is what I want.

Later in the process, I want to add another column of type string.

df['SOMETEXT'] = "SOME TEXT FOR ANALYSIS"

The dtype of this column is coming though as dtype of object, but I need it to be type string. So I do the following:

df['SOMETEXT'] = df['SOMETEXT'].astype(str)

If I look at the dtype again, I get the same dtype for that column: object.

I have a process down my workflow that is dtype sensitive and I need the column to be a string.

Any ideas?

array = df.to_records(index=False) # convert to numpy array

The dtypes on the array still carry the object dtype, but the columns should be a string.

Upvotes: 0

Views: 677

Answers (1)

chrisaycock
chrisaycock

Reputation: 37930

In pandas, all strings are object type. It confused me too when I first started.

Once in NumPy, you can cast the string:

In [24]: array['SOMETEXT'].astype(str)
Out[24]: 
array(['SOME TEXT FOR ANALYSIS', 'SOME TEXT FOR ANALYSIS',
       'SOME TEXT FOR ANALYSIS'], 
      dtype='<U22')

Upvotes: 3

Related Questions