maynull
maynull

Reputation: 2046

How to select certain columns by multiple conditions?

I've made a dataframe from a dictionary to manipulate it.

dic_people = defaultdict(dict)

dic_people['A']['language']    = 'English'
dic_people['A']['nationality'] = 'Russia'
dic_people['A']['joined']      = 201010

dic_people['B']['language']    = 'French'
dic_people['B']['nationality'] = 'Canada'
dic_people['B']['joined']      = 201009

dic_people['C']['language']    = 'English'
dic_people['C']['nationality'] = 'Canada'
dic_people['C']['joined']      = 201008

dic_people['D']['language']    = 'French'
dic_people['D']['nationality'] = 'France'
dic_people['D']['joined']      = 201007

dic_people['E']['language']    = 'English'
dic_people['E']['nationality'] = 'Ireland'
dic_people['E']['joined']      = 201011

df = pd.DataFrame.from_dict(dic_people)

>>> df
                A       B        C       D        E
joined        201010  201009   201008  201007   201011
language     English  French  English  French  English
nationality   Russia  Canada   Canada  France  Ireland

I want to select 2 people who 1) joined earliest and 2) speak English. So, the result will be

                A       C      
joined        201010  201008  
language     English  English 
nationality   Russia  Canada 

I'd like to know how to do this. I tried df[ df.loc['language'] == 'English'], but it seems it's somewhat different when referring to rows, not columns.

Upvotes: 2

Views: 59

Answers (1)

jezrael
jezrael

Reputation: 862406

Use DataFrame.loc for filter by rows and for first 2 column select by positions by DataFrame.iloc:

#if necessary sorting by 'joined' value in index
#df = df.sort_values('joined', axis=1)

df1 = df.loc[:,  df.loc['language'] == 'English'].iloc[:, :2]
print (df1)
                   A        C
language     English  English
nationality   Russia   Canada
joined        201010   201008

Upvotes: 6

Related Questions