Reputation: 3433
The data:
df = pd.DataFrame()
df['date'] = pd.date_range("20130101", periods=12)
df['bol'] = np.array([0,1,1,0,1,1,0,1,1,0,1,1])
df['language'] = np.array(['FR','EN','FR','EN','FR','EN','FR','EN','FR','EN','FR','EN'])
df['doctype'] = np.array(['FAM','SELF','FAM','SELF','FAM','FAM','FAM','SELF','FAM','SELF','FAM','FAM'])
set_claims = [[i,i+1] for i in range(12)]
df['my_list'] = pd.Series(set_claims, index=df.index)
Now I would like to query rows with 'EN' and 'SELF'
df.loc[(df.language=='EN') & (df.doctype=='SELF'), ]
In the same line there should be a way to get only the row of the oldest item, i.e. how do I add in that line the query df['date'] being the oldest.
The ultimate goal is to get the value of my_list with language=EN, doctype = 'SELF' being the oldest.
I got a solution in two lines, but I simply dont like it. not "elegant"
oldest = df.loc[(df.language=='EN') & (df.doctype=='SELF'), ].date.min()
df.loc[df.date == oldest,'my_list']
Is there a more straight forward way to make this?
EDITED ANSWER:
myval = (df.loc[(df.language=='EN') & (df.doctype=='SELF'),]).sort_values(by='date',ascending=True).head(1)['my_list'].iloc[0]
If you change ascending=False then you get the most recent entry. With iloc you get the value rather than a pandas object
Upvotes: 0
Views: 406
Reputation: 13821
One way would be to sort
your result, and use head(1)
to get minimum date.
(df.loc[(df.language=='EN') & (df.doctype=='SELF'),]).sort_values(by='date').head(1)
date bol language doctype my_list
1 2013-01-02 1 EN SELF [1, 2]
And if you want just the my_list
values back:
(df.loc[(df.language=='EN') & (df.doctype=='SELF'),]).sort_values(by='date').head(1)['my_list']
1 [1, 2]
Name: my_list, dtype: object
Upvotes: 2
Reputation: 862661
If there is only one oldest date is possible use Series.idxmin
for index of row with minimal value:
idx = df.loc[(df.language=='EN') & (df.doctype=='SELF'), 'date'].idxmin()
df.loc[idx,'my_list']
Upvotes: 1