Reputation: 645
I scraped some data from a local website where people sell their cars. I have useful data like cars' manufacturing year, mileage, price etc. I have a Year-Price plot, but I want to colorize it according to mileage BUT so that for each year, corresponding mileages categorized in three categories.
I tried this code:
df.milage_year = df.loc[df.year==1395].cut(df.milage, 3, labels=['g', 'y', 'r'])
But I get this error:
AttributeError: 'DataFrame' object has no attribute 'cut'
UPDATE: Cut method works according to values! But if we want to categorize according to number of cases which method should we use?
UPDATE 2: This is my input data:
mileage price year
0 41000 70000000 1396
1 33011 73000000 1396
2 2200 81000000 1397
3 116000 45000000 1389
4 18000 71000000 1394
5 54033 65000000 1395
6 183000 42000000 1385
7 226053 44000000 1387
8 150000 45000000 1387
9 4000 78000000 1397
10 246000 42500000 1388
11 143500 35000000 1382
12 197000 40000000 1387
13 250000 38000000 1385
14 2795 81000000 1397
15 17000 40000000 1397
16 180000 30000000 1389
17 100000 61000000 1394
18 27223 71000000 1396
19 140000 49500000 1388
20 65500 71000000 1396
And my expected output is a new column named mileage_year that has three values: 'g', 'y' and 'r' and these values allocated to 'mileage's of each 'year' so that on-third of cases with higher mileage get 'r' and one-third of cases with lower values get 'g' and remaining one-third cases get 'y'
Upvotes: 2
Views: 3737
Reputation: 7410
You can group by
and then apply
the qcut
like:
df['mileage_year'] = df.groupby('year').mileage.apply(lambda x: pd.qcut(x, 3, labels=['g', 'y', 'r'], duplicates='drop'))
Upvotes: 3