Reputation: 91
I am using norway_new_car_sales_by_model.csv Dataset here dataset which you find. I want to find the model that has highest sales fluctuation over the years. I am using standard deviation of the yearly total sales for each model.
Expected output is:
import pandas as pd
import numpy as np
data=pd.read_csv("norway_new_car_sales_by_model.csv",header=None,encoding="latin-1")
data.columns = ['Year','Month','Make','Model','Quantity','Pct']#give column name
data.drop(data.head(1).index, inplace=True) #drop first row
data[['Quantity']]=data[['Quantity']].astype(np.int64)
data.dropna(subset=['Quantity'], how='all', inplace = True)
maketotal_1 = data.pivot_table(values='Quantity',index=['Month','Model','Make'],aggfunc=np.std)
My question are that
1) I did not handle nan values... even if i try many codes...
2) How to get Audi A4 Audi from Index column
Upvotes: 2
Views: 3401
Reputation: 863751
I think need:
First remove parameter header=None
from read_csv
, because first in csv are columns names:
data=pd.read_csv("norway_new_car_sales_by_model.csv",encoding="latin-1")
print (data.head())
Year Month Make Model Quantity Pct
0 2007 1 Volkswagen Volkswagen Passat 1267 10.0
1 2007 1 Toyota Toyota Rav4 819 6.5
2 2007 1 Toyota Toyota Avensis 787 6.2
3 2007 1 Volkswagen Volkswagen Golf 720 5.7
4 2007 1 Toyota Toyota Corolla 691 5.4
Apply pivot_table
function with np.std
:
maketotal_1=data.pivot_table(values='Quantity',index=['Month','Model','Make'],aggfunc=np.std)
print (maketotal_1.head())
Quantity
Month Model Make
1 Audi A3 Audi 50.986109
Audi A4 Audi 60.549704
Audi A6 Audi NaN
Audi Q3 Audi NaN
BMW 2-serie BMW NaN
Last first remove NaN
s by dropna
and use reset_index
for convert MultiIndex
to columns and create unique default index:
df1 = maketotal_1.dropna().reset_index()
Last per groups by Make
get indices of max values by idxmax
and then select rows by loc
:
df3 = df1.loc[df1.groupby('Make')['Quantity'].idxmax()]
print (df3)
Month Model Make Quantity
447 12 Audi A3 Audi 119.867427
415 11 BMW i3 BMW 460.936366
56 2 Ford Mondeo Ford 169.889880
235 6 Honda CR-V Honda 171.579671
457 12 Hyundai ix35 Hyundai 32.526912
348 9 Kia Sportage Kia 55.154329
60 2 Mazda CX-5 Mazda 144.030957
14 1 Mercedes-Benz GLC Mercedes-Benz 119.501046
160 4 Mitsubishi ASX Mitsubishi 312.541197
391 10 Nissan Leaf Nissan 225.322584
114 3 Opel Astra Opel 85.182158
22 1 Peugeot 207 Peugeot 97.962578
168 4 Renault Zoe Renault 53.740115
395 10 Skoda Octavia Skoda 121.668767
122 3 Suzuki Vitara Suzuki 85.559921
123 3 Tesla Model S Tesla 510.400823
33 1 Toyota Corolla Toyota 326.683333
179 4 Volkswagen Golf Volkswagen 454.872681
485 12 Volvo V40 Volvo 183.919366
EDIT:
There is no Citroen
because np.std
return NaN
:
print (maketotal_1[maketotal_1.index.get_level_values('Make') == 'Citroen '])
Quantity
Month Model Make
11 Citroen C4 Aircross Citroen NaN
Upvotes: 2