Max
Max

Reputation: 471

Not filling NaN values in dataframe

Lets say I have the following df:

      quantity#1    taxsubtotal#1    taxrate#1    quantity#2    taxsubtotal#2    taxrate#2
--  ------------  ---------------  -----------  ------------  ---------------  -----------
 0           nan             1.05           21           nan            nan            nan
 2             1             2.1            21             1              1.8            9
 6             1             0               0           nan              nan            nan
13             1             0.9             9             1              1.8            9
21             1            23.4             9             1              2.7            9

I don't want to write the NaN values to the columns of a df:

df3 = pd.DataFrame({
'InvoiceLine1':"""
    <cbc:ID>1</cbc:ID>
    <cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1'].astype(str)+"""</cbc:InvoicedQuantity>
        <cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#1'].astype(str)+"""</cbc:TaxAmount>
          <cbc:Percent>"""+dftaxitems1['taxrate#1'].astype(str)+"""</cbc:Percent>""",
'InvoiceLine2':"""
    <cbc:ID>2</cbc:ID>
    <cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#2'].astype(str)+"""</cbc:InvoicedQuantity>
        <cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#2'].astype(str)+"""</cbc:TaxAmount>
          <cbc:Percent>"""+dftaxitems1['taxrate#2'].astype(str)+"""</cbc:Percent>""",
})

Assessing the type of nan:

type:
type(dftaxitems['quantity#2'][0])
numpy.float64

Getting the folllowing output:

    InvoiceLine1                                       InvoiceLine2
0   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
2   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
13  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...

Desired output:

    InvoiceLine1                                       InvoiceLine2
0   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... 
2   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6   \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... 
13  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21  \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...

df3.fillna('') did not work!

What could help according to you guys :)?

I've tried to transform all values to np.nan so that it can be accurately deleted in the new df

Please help!

Upvotes: 0

Views: 51

Answers (1)

jezrael
jezrael

Reputation: 862641

Try first convert values to strings and then empty strings to missing values:

df = df.astype(str).replace('', np.nan)

and then remove .astype(str) later like dftaxitems1['quantity#1'].astype(str).

Test:

dftaxitems1 = pd.DataFrame({'quantity#1': ['', 1.0, 1.0, 1.0, 1.0]})
dftaxitems1 = dftaxitems1.astype(str).replace('', np.nan)

s = """<cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1']+"""</cbc:InvoicedQuantity>"""
 
print (s)
0                                                  NaN
1    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
2    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
3    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
4    <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
Name: quantity#1, dtype: object

Upvotes: 1

Related Questions