Reputation: 1246
I have the following dataframe:
And using the pandas.qcut function I'm trying to make a new column that cuts by Animal by a factor of 3 like so:
Here is my code so far:
import pandas as pd
df=pd.DataFrame({'Name':['Harry','Sally','Mary','John','Francis','Devon','James','Holly','Molly','Nancy','Ben'], 'Score': [43,234,54,34,12,43,54,65,23,12,32],
'Animal': ['dog', 'dog', 'cat', 'cat', 'dog', 'horse', 'dog', 'snake', 'dog', 'mouse', 'horse']})
tiles = pd.qcut(df.index, 3, labels=False)
tiles=tiles+1
df['tiles']=tiles
print(df)
How do I make my qcut function take into my account my 'Animal' column?
Thanks in advance.
Upvotes: 0
Views: 49
Reputation: 31011
The key to success is a proper function generating Tile
values:
def tbl(x):
ccl = itertools.cycle([1,2,3])
lst = [ next(ccl) for _ in range(len(x)) ]
return pd.Series(lst, x.index)
It works almost like cumcount()
with one difference: Instead of consecutive
numbers (from range
) it generates a cycled sequence of [1,2,3]
,
using itertools.cycle
.
Then all you have to do (after necessary imports and creation of the source DataFrame) is:
Animal
.Animal
, take any column (e.g. Name
) and apply to them
the above mentioned function.So the whole script (without tbl
declaration) can be as follows:
import pandas as pd
import itertools
df = pd.DataFrame( {'Name': ['Harry', 'Sally', 'Mary', 'John', 'Francis',
'Devon', 'James', 'Holly', 'Molly', 'Nancy', 'Ben'],
'Score': [43, 234, 54, 34, 12, 43, 54, 65, 23, 12, 32],
'Animal': ['dog', 'dog', 'cat', 'cat', 'dog', 'horse', 'dog', 'snake',
'cat', 'mouse', 'mouse']})
df.sort_values(by='Animal', inplace=True)
df['Tile'] = df.groupby('Animal')['Name'].apply(tbl)
When you print df
, the result is:
Name Score Animal Tile
2 Mary 54 cat 1
3 John 34 cat 2
8 Molly 23 cat 3
0 Harry 43 dog 1
1 Sally 234 dog 2
4 Francis 12 dog 3
6 James 54 dog 1
5 Devon 43 horse 1
9 Nancy 12 mouse 1
10 Ben 32 mouse 2
7 Holly 65 snake 1
Upvotes: 1
Reputation: 51395
I don't know if qcut
is the way to go, but you can do this using groupby('Animal').cumcount
, and taking the modulus of the cumulative count and 3 (% 3
):
>>> df
Animal Name
0 cat Harry
1 cat Sally
2 cat Mary
3 dog John
4 dog Francis
5 dog Devon
6 dog James
7 horse Holly
8 mouse Molly
9 mouse Nancy
10 snake Ben
df['Tile'] = (df.groupby('Animal').cumcount()%3)+1
>>> df
Animal Name Tile
0 cat Harry 1
1 cat Sally 2
2 cat Mary 3
3 dog John 1
4 dog Francis 2
5 dog Devon 3
6 dog James 1
7 horse Holly 1
8 mouse Molly 1
9 mouse Nancy 2
10 snake Ben 1
Upvotes: 2