abbassix
abbassix

Reputation: 645

How use pandas' cut method for different sections of a data frame?

I scraped some data from a local website where people sell their cars. I have useful data like cars' manufacturing year, mileage, price etc. I have a Year-Price plot, but I want to colorize it according to mileage BUT so that for each year, corresponding mileages categorized in three categories.

I tried this code:

df.milage_year = df.loc[df.year==1395].cut(df.milage, 3, labels=['g', 'y', 'r'])

But I get this error:

AttributeError: 'DataFrame' object has no attribute 'cut'

UPDATE: Cut method works according to values! But if we want to categorize according to number of cases which method should we use?

UPDATE 2: This is my input data:

    mileage price       year
0   41000   70000000    1396
1   33011   73000000    1396
2   2200    81000000    1397
3   116000  45000000    1389
4   18000   71000000    1394
5   54033   65000000    1395
6   183000  42000000    1385
7   226053  44000000    1387
8   150000  45000000    1387
9   4000    78000000    1397
10  246000  42500000    1388
11  143500  35000000    1382
12  197000  40000000    1387
13  250000  38000000    1385
14  2795    81000000    1397
15  17000   40000000    1397
16  180000  30000000    1389
17  100000  61000000    1394
18  27223   71000000    1396
19  140000  49500000    1388
20  65500   71000000    1396

And my expected output is a new column named mileage_year that has three values: 'g', 'y' and 'r' and these values allocated to 'mileage's of each 'year' so that on-third of cases with higher mileage get 'r' and one-third of cases with lower values get 'g' and remaining one-third cases get 'y'

Upvotes: 2

Views: 3737

Answers (1)

Franco Piccolo
Franco Piccolo

Reputation: 7410

You can group by and then apply the qcut like:

df['mileage_year'] = df.groupby('year').mileage.apply(lambda x: pd.qcut(x, 3, labels=['g', 'y', 'r'], duplicates='drop'))

Upvotes: 3

Related Questions