How to deal with NaN and None in Multi-label binarization

Question

I am doing a multi-label classification project with scikitlearn. What I am going to do is to binarize the target feature, however, I have some difficulties during the data transform.

Here is the raw data:

107              RA37|RA41|RM153 |RWT037
108    DA35|DA47|DWT030|DA35|DA47|DWT030
109                                  NaN
110                        PI001 |PI040 
111                        PI001 |PI040 
112                     RA37|RA41|RWT037
113    DA35|DA47|DWT030|DA35|DA47|DWT030
114                                  NaN
Name: exclusions, dtype: object

Then I split it up to more columns with str.split('|',expand=True) and I got the following output：

        0   1   2   3   4   5   6   7   8   9   ... 18  19  20  21  22  23  24  25  26  27
107 RA37    RA41    RM153   RWT037  None    None    None    None    None    None    ... None    None    None    None    None    None    None    None    None    None
108 DA35    DA47    DWT030  DA35    DA47    DWT030  None    None    None    None    ... None    None    None    None    None    None    None    None    None    None
109 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
110 PI001   PI040   None    None    None    None    None    None    None    None    ... None    None    None    None    None    None    None    None    None    None
111 PI001   PI040   None    None    None    None    None    None    None    None    ... None    None    None    None    None    None    None    None    None    None
112 RA37    RA41    RWT037  None    None    None    None    None    None    None    ... None    None    None    None    None    None    None    None    None    None
113 DA35    DA47    DWT030  DA35    DA47    DWT030  None    None    None    None    ... None    None    None    None    None    None    None    None    None    None
114 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

As you can see, Since there are tons of NaN before processed, the result is mixed with NaN and None. That means I cannot directly use multilaberbinarizer to deal with all these different data types. How do it fix this problem, thanks in advance!

How to deal with NaN and None in Multi-label binarization

Answers (1)

Related Questions