Reputation: 4102
I have a simple pandas dataframe with a column:
col = [['A']]
data = [[1.0],[2.3],[3.4]]
df = pd.DataFrame.from_records(data, columns=col)
This creates a dataframe with one column of type np.float64
, which is what I want.
Later in the process, I want to add another column of type string
.
df['SOMETEXT'] = "SOME TEXT FOR ANALYSIS"
The dtype of this column is coming though as dtype of object
, but I need it to be type string
. So I do the following:
df['SOMETEXT'] = df['SOMETEXT'].astype(str)
If I look at the dtype again, I get the same dtype for that column: object
.
I have a process down my workflow that is dtype sensitive and I need the column to be a string
.
Any ideas?
array = df.to_records(index=False) # convert to numpy array
The dtypes on the array still carry the object
dtype, but the columns should be a string
.
Upvotes: 0
Views: 677
Reputation: 37930
In pandas, all strings are object
type. It confused me too when I first started.
Once in NumPy, you can cast the string:
In [24]: array['SOMETEXT'].astype(str)
Out[24]:
array(['SOME TEXT FOR ANALYSIS', 'SOME TEXT FOR ANALYSIS',
'SOME TEXT FOR ANALYSIS'],
dtype='<U22')
Upvotes: 3