Reputation: 103

Adding a conditional column to a dataframe if the value is in a list

I have a pandas dataframe with a column called 'letters' that has letters of the alphabet as its values. So the first row could be 'C', next row 'Z', next 'M' so on and so forth:

letters

....

I also have lists that groups the letters of the alphabet by where they fall, so:

early_alph = ['A','B','C'....'M']
late_alph = ['N','O','P'....'Z']

I would like to to add a new column to my dataframe that groups the letters of the alphabet depending on which list they were in. If this was SQL I would write something like the following:

%let early_alph = ('A','B','C'....'M');
%let late_alph = ('N','O','P'....'Z');

create table my_df as 
 select 
  letters, 
  case when letters IN &early_alph. then 'early_alph' 
       when letters IN &late_alph then 'late_alph' 
       else 'unknown' end as 'Alph_group'
from my_table;

The output I would be looking for is:

letters Alph_group

C early_alph

Z late_alph

M early_alph

A early_alph

B early_alph

O late_alph

I'm very new to Python and pandas, but in my research it seems like Numpy's np.where was promising, but every example I found tested against one value (rather than a list of values). Any help would be greatly appreciated.

Upvotes: 2

Answers (2)

akuiper

Reputation: 215057

You can use numpy.where + isin:

import pandas as pd
import numpy as np

Set Up:

letters = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')    
df = pd.DataFrame(np.random.choice(letters, 10), columns=['letters'])
early_alph = letters[:13]
late_alph = letters[13:]

df['cat'] = np.where(df.letters.isin(early_alph), 'early_alph',
            np.where(df.letters.isin(late_alph), 'late_alph', 'unkown'))

df
#letters       cat
#0    B early_alph
#1    Y  late_alph
#2    W  late_alph
#3    U  late_alph
#4    A early_alph
#5    K early_alph
#6    J early_alph
#7    C early_alph
#8    A early_alph
#9    C early_alph

Or another option, create a dictionary from the early_alph and late_alph lists that maps from letter to the categories and then use map method:

dd = dict([(early, 'early_alph') for early in early_alph] + 
          [(late, 'late_alph') for late in late_alph])

df['cat'] = df.letters.map(dd).fillna('unkown')
df
#letters       cat
#0  B   early_alph
#1  Y    late_alph
#2  W    late_alph
#3  U    late_alph
#4  A   early_alph
#5  K   early_alph
#6  J   early_alph
#7  C   early_alph
#8  A   early_alph
#9  C   early_alph

Upvotes: 4

Ted Petrou

Reputation: 62007

One way to do this is with a nested where.

np.where(df.letters.between('A', 'M'), 'early_alph', np.where(df.letters.between('N','Z'), 'late_alph', 'unknown')

Upvotes: 3

Adding a conditional column to a dataframe if the value is in a list

letters

Answers (2)

Related Questions