semiflex
semiflex

Reputation: 1246

Using pandas.qcut while taking into account a column Python

I have the following dataframe:

enter image description here

And using the pandas.qcut function I'm trying to make a new column that cuts by Animal by a factor of 3 like so:

enter image description here

Here is my code so far:

    import pandas as pd
df=pd.DataFrame({'Name':['Harry','Sally','Mary','John','Francis','Devon','James','Holly','Molly','Nancy','Ben'], 'Score': [43,234,54,34,12,43,54,65,23,12,32],
                 'Animal': ['dog', 'dog', 'cat', 'cat', 'dog', 'horse', 'dog', 'snake', 'dog', 'mouse', 'horse']})
tiles = pd.qcut(df.index, 3, labels=False)
tiles=tiles+1
df['tiles']=tiles
print(df)

How do I make my qcut function take into my account my 'Animal' column?

Thanks in advance.

Upvotes: 0

Views: 49

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 31011

The key to success is a proper function generating Tile values:

def tbl(x):
    ccl = itertools.cycle([1,2,3])
    lst = [ next(ccl) for _ in range(len(x)) ]
    return pd.Series(lst, x.index)

It works almost like cumcount() with one difference: Instead of consecutive numbers (from range) it generates a cycled sequence of [1,2,3], using itertools.cycle.

Then all you have to do (after necessary imports and creation of the source DataFrame) is:

  • Sort values by Animal.
  • Group by Animal, take any column (e.g. Name) and apply to them the above mentioned function.

So the whole script (without tbl declaration) can be as follows:

import pandas as pd
import itertools

df = pd.DataFrame( {'Name': ['Harry', 'Sally', 'Mary', 'John', 'Francis',
        'Devon', 'James', 'Holly', 'Molly', 'Nancy', 'Ben'],
    'Score': [43, 234, 54, 34, 12, 43, 54, 65, 23, 12, 32],
    'Animal': ['dog', 'dog', 'cat', 'cat', 'dog', 'horse', 'dog', 'snake',
        'cat', 'mouse', 'mouse']})
df.sort_values(by='Animal', inplace=True)
df['Tile'] = df.groupby('Animal')['Name'].apply(tbl)

When you print df, the result is:

       Name  Score Animal  Tile
2      Mary     54    cat     1
3      John     34    cat     2
8     Molly     23    cat     3
0     Harry     43    dog     1
1     Sally    234    dog     2
4   Francis     12    dog     3
6     James     54    dog     1
5     Devon     43  horse     1
9     Nancy     12  mouse     1
10      Ben     32  mouse     2
7     Holly     65  snake     1

Upvotes: 1

sacuL
sacuL

Reputation: 51395

I don't know if qcut is the way to go, but you can do this using groupby('Animal').cumcount, and taking the modulus of the cumulative count and 3 (% 3):

>>> df
   Animal     Name
0     cat    Harry
1     cat    Sally
2     cat     Mary
3     dog     John
4     dog  Francis
5     dog    Devon
6     dog    James
7   horse    Holly
8   mouse    Molly
9   mouse    Nancy
10  snake      Ben

df['Tile'] = (df.groupby('Animal').cumcount()%3)+1

>>> df
   Animal     Name  Tile
0     cat    Harry     1
1     cat    Sally     2
2     cat     Mary     3
3     dog     John     1
4     dog  Francis     2
5     dog    Devon     3
6     dog    James     1
7   horse    Holly     1
8   mouse    Molly     1
9   mouse    Nancy     2
10  snake      Ben     1

Upvotes: 2

Related Questions