user11644345
user11644345

Reputation:

Why NaN in pivot table?

I've removed all NaN from a df using df = df.fillna(0).

After I create a pivot table using

pd.pivot_table(df, index='Source', columns='Customer Location', values='Total billed £')

I still get NaN data again as output.

Could someone explain me why and how to prevent this output and why this is happening?

Upvotes: 4

Views: 1790

Answers (2)

jezrael
jezrael

Reputation: 862741

Because of your input data, it converts one column to index and the values of another one to columns. The intersection of these are the aggregated values. But if some combinations do not exist in the input data, these will result into missing data (NaN).

df = pd.DataFrame({
        'Source':list('abcdef'),
         'Total billed £':[5,3,6,9,2,4],
         'Customer Location':list('adfbbb')
})

print (df)
  Source  Total billed £ Customer Location
0      a               5                 a
1      b               3                 d
2      c               6                 f
3      d               9                 b
4      e               2                 b
5      f               4                 b

#e.g because `Source=a` and `Customer Location=b` not exist in source then NaN in output
print (pd.pivot_table(df,index='Source', columns='Customer Location',values='Total billed £'))
Customer Location    a    b    d    f
Source                               
a                  5.0  NaN  NaN  NaN
b                  NaN  NaN  3.0  NaN
c                  NaN  NaN  NaN  6.0
d                  NaN  9.0  NaN  NaN
e                  NaN  2.0  NaN  NaN
f                  NaN  4.0  NaN  NaN

Furthermore, here's a good read on reshaping data.

Upvotes: 3

Dani Mesejo
Dani Mesejo

Reputation: 61910

The reason is simple there is a pair of (index, column) values that is missing from your data, for example:

df = pd.DataFrame({"Source": ["foo", "bar", "bar", "bar"],
                   "Customer Location": ["one", "one", "two", "two", ],
                   "Total billed £": [10, 20, 30, 40]})

print(df)

Setup

  Source Customer Location  Total billed £
0    foo               one              10
1    bar               one              20
2    bar               two              30
3    bar               two              40

As you can see there is no ('foo', 'two') pair in your data, so when you do:

result = pd.pivot_table(df, index='Source', columns='Customer Location', values='Total billed £')
print(result)

Output

Customer Location   one   two
Source                       
bar                20.0  35.0
foo                10.0   NaN

To fix the problem provide a default value using the fill_value parameter:

result = pd.pivot_table(df, index='Source', columns='Customer Location', values='Total billed £', fill_value=0)

Output

Customer Location  one  two
Source                     
bar                 20   35
foo                 10    0

Upvotes: 2

Related Questions