Reputation: 471
Lets say I have the following df:
quantity#1 taxsubtotal#1 taxrate#1 quantity#2 taxsubtotal#2 taxrate#2
-- ------------ --------------- ----------- ------------ --------------- -----------
0 nan 1.05 21 nan nan nan
2 1 2.1 21 1 1.8 9
6 1 0 0 nan nan nan
13 1 0.9 9 1 1.8 9
21 1 23.4 9 1 2.7 9
I don't want to write the NaN values to the columns of a df:
df3 = pd.DataFrame({
'InvoiceLine1':"""
<cbc:ID>1</cbc:ID>
<cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1'].astype(str)+"""</cbc:InvoicedQuantity>
<cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#1'].astype(str)+"""</cbc:TaxAmount>
<cbc:Percent>"""+dftaxitems1['taxrate#1'].astype(str)+"""</cbc:Percent>""",
'InvoiceLine2':"""
<cbc:ID>2</cbc:ID>
<cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#2'].astype(str)+"""</cbc:InvoicedQuantity>
<cbc:TaxAmount currencyID="EUR">"""+dftaxitems1['taxsubtotal#2'].astype(str)+"""</cbc:TaxAmount>
<cbc:Percent>"""+dftaxitems1['taxrate#2'].astype(str)+"""</cbc:Percent>""",
})
Assessing the type of nan:
type:
type(dftaxitems['quantity#2'][0])
numpy.float64
Getting the folllowing output:
InvoiceLine1 InvoiceLine2
0 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
2 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
13 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
Desired output:
InvoiceLine1 InvoiceLine2
0 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua...
2 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
6 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua...
13 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
21 \n <cbc:ID>1</cbc:ID>\n <cbc:InvoicedQua... \n <cbc:ID>2</cbc:ID>\n <cbc:InvoicedQua...
df3.fillna('')
did not work!
What could help according to you guys :)?
I've tried to transform all values to np.nan so that it can be accurately deleted in the new df
Please help!
Upvotes: 0
Views: 51
Reputation: 862641
Try first convert values to strings and then empty strings to missing values:
df = df.astype(str).replace('', np.nan)
and then remove .astype(str)
later like dftaxitems1['quantity#1'].astype(str)
.
Test:
dftaxitems1 = pd.DataFrame({'quantity#1': ['', 1.0, 1.0, 1.0, 1.0]})
dftaxitems1 = dftaxitems1.astype(str).replace('', np.nan)
s = """<cbc:InvoicedQuantity unitCode="ZZ">"""+dftaxitems1['quantity#1']+"""</cbc:InvoicedQuantity>"""
print (s)
0 NaN
1 <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
2 <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
3 <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
4 <cbc:InvoicedQuantity unitCode="ZZ">1.0</cbc:I...
Name: quantity#1, dtype: object
Upvotes: 1