Unable to convert a object to a string

Question

When attempting to convert a column in Python 3 from an object to a string, the code I am using doesn't error, but it also doesn't change the type.

import pandas as pd
import numpy as np
import nltk
import os
import nltk
nltk.download('punkt')
import nltk.corpus
import re

#Read in fields
jan = pd.read_excel(r'C:\Users\Sabrina\JIRA\2019\2019_jan.xls')

#Indicate columns for performing tokenization
jan_a = pd.DataFrame(jan, columns= ['Summary'])

#Tokenize columns for text analysis
jan_a['Summary'] = jan_a.apply(lambda column: 
    nltk.word_tokenize(column['Summary']), axis=1)
print(jan_a)
print(jan_a['Summary'].dtypes)

#Convert list to string
jan_a['Summary'].astype('str')
print(jan_a['Summary'].dtypes)

The output for both dtypes is object, any assistance would be appreciated!

Copperfield · Accepted Answer

The default behavior is to treat python str as object by default

>>> import pandas as pd
>>> df = pd.DataFrame(["aa1 bb2 cc3".split(),"aa4 bb5 cc6".split()],columns="col1 col2 col3".split())
>>> 
>>> df
  col1 col2 col3
0  aa1  bb2  cc3
1  aa4  bb5  cc6
>>> df["col1"]
0    aa1
1    aa4
Name: col1, dtype: object
>>>

you need to explicit told it that you want string either on creation by adding a dtype="string"

>>> df2 = pd.DataFrame(["aa1 bb2 cc3".split(),"aa4 bb5 cc6".split()],dtype="string",columns="col1 col2 col3".split())
>>> df2
  col1 col2 col3
0  aa1  bb2  cc3
1  aa4  bb5  cc6
>>> df2["col1"]
0    aa1
1    aa4
Name: col1, dtype: string
>>>

or by later transforming it with astype

>>> df["col1"].astype("string")
0    aa1
1    aa4
Name: col1, dtype: string
>>>

link to the relevant part of the documentation for more detail: https://pandas.pydata.org/docs/user_guide/text.html#text-types

Unable to convert a object to a string

Answers (2)

Related Questions