Benjamin Fogiel
Benjamin Fogiel

Reputation: 146

Cast multiple variable types to a Series object (DataFrame column)

Explanation of my issue

Take this DataFrame for example:

pd.DataFrame(
    data = np.array([
        ['A','1'],
        ['B','2'],
        ['C', 'False'],
    ])
)

Is there a good way to appropriately set the second column's element type to either float or boolean?

I am simply given that DataFrame where all variables are initially strings. In reality, I have tons of rows and each DataFrame is different so the indices that need to be set to floats and bools change. Therefore, I cannot create a default dtype 'template' to refer to.

Solutions I've Explored

Summary

Essentially, given a series object where elements are initially of type string, I need to cast the appropriate elements to either type float or bool. Is there an elegant way of doing this without looping through each element and casting either float or bool? Is there a pandas function that I'm missing?

Thanks in advance for any help!

Upvotes: 1

Views: 366

Answers (1)

Benjamin Fogiel
Benjamin Fogiel

Reputation: 146

Solution

Here is what worked:

given:

pd = pd.DataFrame(
    data = np.array([
        ['A','1'],
        ['B','2'],
        ['C', 'False'],
    ])
)

Assuming that all booleans variables will be either "False" or "True" within the DataFrame, and all other values are a valid float, we can use a lambda function to iterate through the rows of the DataFrame and cast types:

df[1] = df.apply(lambda row: bool(row[1]) if (row[1] == 'False' or row[1] == 'True') else float(row[1]), axis=1)

which results to the desired output:

>>> df[1][0].type
<class 'float'>
>>> df[1][1].type
<class 'float'>
>>> df[1][2].type
<class 'bool'>

Upvotes: 1

Related Questions