Reputation: 103
I have a pandas dataframe with a column called 'letters' that has letters of the alphabet as its values. So the first row could be 'C', next row 'Z', next 'M' so on and so forth:
C
Z
M
A
B
O
....
I also have lists that groups the letters of the alphabet by where they fall, so:
early_alph = ['A','B','C'....'M']
late_alph = ['N','O','P'....'Z']
I would like to to add a new column to my dataframe that groups the letters of the alphabet depending on which list they were in. If this was SQL I would write something like the following:
%let early_alph = ('A','B','C'....'M');
%let late_alph = ('N','O','P'....'Z');
create table my_df as
select
letters,
case when letters IN &early_alph. then 'early_alph'
when letters IN &late_alph then 'late_alph'
else 'unknown' end as 'Alph_group'
from my_table;
The output I would be looking for is:
letters Alph_group
C early_alph
Z late_alph
M early_alph
A early_alph
B early_alph
O late_alph
I'm very new to Python and pandas, but in my research it seems like Numpy's np.where was promising, but every example I found tested against one value (rather than a list of values). Any help would be greatly appreciated.
Upvotes: 2
Views: 1161
Reputation: 215057
You can use numpy.where
+ isin
:
import pandas as pd
import numpy as np
Set Up:
letters = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
df = pd.DataFrame(np.random.choice(letters, 10), columns=['letters'])
early_alph = letters[:13]
late_alph = letters[13:]
df['cat'] = np.where(df.letters.isin(early_alph), 'early_alph',
np.where(df.letters.isin(late_alph), 'late_alph', 'unkown'))
df
#letters cat
#0 B early_alph
#1 Y late_alph
#2 W late_alph
#3 U late_alph
#4 A early_alph
#5 K early_alph
#6 J early_alph
#7 C early_alph
#8 A early_alph
#9 C early_alph
Or another option, create a dictionary from the early_alph
and late_alph
lists that maps from letter
to the categories and then use map
method:
dd = dict([(early, 'early_alph') for early in early_alph] +
[(late, 'late_alph') for late in late_alph])
df['cat'] = df.letters.map(dd).fillna('unkown')
df
#letters cat
#0 B early_alph
#1 Y late_alph
#2 W late_alph
#3 U late_alph
#4 A early_alph
#5 K early_alph
#6 J early_alph
#7 C early_alph
#8 A early_alph
#9 C early_alph
Upvotes: 4
Reputation: 62007
One way to do this is with a nested where.
np.where(df.letters.between('A', 'M'), 'early_alph', np.where(df.letters.between('N','Z'), 'late_alph', 'unknown')
Upvotes: 3