Split pandas column into new columns in presence of NaN

Question

I have a pandas DataFrame containing a string column which needs splitting into two separate columns. The answer using tolist that I found on SO from this question works like a charm, except when my column contains NaNs. The excerpt below describes the difficulty:

import pandas as pd
import numpy as np

# Example DataFrame
df = pd.DataFrame([[25.0, '34.2/ 18.1', 'one'],
                   [32.6, '28.6/ 17.9', 'two'],
                   [12.5, '30.1/ 17.6', 'three']], columns=['A', 'B', 'C'])
df2 = df.copy()

# This method works when all data are present
df['D'] = pd.DataFrame(df['B'].str.split('/').tolist())[1]

# However, when there are NaNs:
df2['B'][0] = np.nan

# This line fails
df2['D'] = pd.DataFrame(df2['B'].str.split('/').tolist())[1]

It gives me a KeyError, because the intermediate DataFrame only has one column, indicating that the bother of going to a list and back doesn't accomplish anything anymore:

               0
0            NaN
1  [28.6,  17.9]
2  [30.1,  17.6]

I've tried dropping the NaN first via pd.DataFrame(df2['B'].str.split('/').dropna().tolist()), but then I lose my index ... I need to keep the NaN at index 0. I've also thought of somehow duplicating the NaN in the creation of the intermediate DataFrame to force the two columns, but am having no luck.

This is what I would need my data to look like for df2:

      A           B      C     D
0  25.0         NaN    one   NaN
1  32.6  28.6/ 17.9    two  17.9
2  12.5  30.1/ 17.6  three  17.6

Is there a way to do this without using a list as an intermediary? Or somehow deal with the NaN?

Alex Riley · Accepted Answer

You can continue to use your method if you use the str accessor again after the split (instead of using tolist() and making another DataFrame):

>>> df2['D'] = df2['B'].str.split('/').str[-1]
>>> df2
      A           B      C      D
0  25.0         NaN    one    NaN
1  32.6  28.6/ 17.9    two   17.9
2  12.5  30.1/ 17.6  three   17.6

This returns NaN if the index doesn't exist, instead of raising the error.

Split pandas column into new columns in presence of NaN

Answers (2)

Related Questions