JFerro
JFerro

Reputation: 3433

python pandas column value of the oldest date in other column

The data:

df = pd.DataFrame()
df['date'] = pd.date_range("20130101", periods=12) 
df['bol']  = np.array([0,1,1,0,1,1,0,1,1,0,1,1])
df['language'] = np.array(['FR','EN','FR','EN','FR','EN','FR','EN','FR','EN','FR','EN'])
df['doctype'] = np.array(['FAM','SELF','FAM','SELF','FAM','FAM','FAM','SELF','FAM','SELF','FAM','FAM'])
set_claims = [[i,i+1] for i in range(12)]
df['my_list'] = pd.Series(set_claims, index=df.index)

Now I would like to query rows with 'EN' and 'SELF'

df.loc[(df.language=='EN') & (df.doctype=='SELF'), ]

In the same line there should be a way to get only the row of the oldest item, i.e. how do I add in that line the query df['date'] being the oldest.

The ultimate goal is to get the value of my_list with language=EN, doctype = 'SELF' being the oldest.

I got a solution in two lines, but I simply dont like it. not "elegant"

oldest = df.loc[(df.language=='EN') & (df.doctype=='SELF'), ].date.min()
df.loc[df.date == oldest,'my_list']

Is there a more straight forward way to make this?


EDITED ANSWER:

myval = (df.loc[(df.language=='EN') & (df.doctype=='SELF'),]).sort_values(by='date',ascending=True).head(1)['my_list'].iloc[0]

If you change ascending=False then you get the most recent entry. With iloc you get the value rather than a pandas object

Upvotes: 0

Views: 406

Answers (2)

sophocles
sophocles

Reputation: 13821

One way would be to sort your result, and use head(1) to get minimum date.

(df.loc[(df.language=='EN') & (df.doctype=='SELF'),]).sort_values(by='date').head(1)

        date  bol language doctype my_list
1 2013-01-02    1       EN    SELF  [1, 2]

And if you want just the my_list values back:

(df.loc[(df.language=='EN') & (df.doctype=='SELF'),]).sort_values(by='date').head(1)['my_list']

1    [1, 2]
Name: my_list, dtype: object

Upvotes: 2

jezrael
jezrael

Reputation: 862661

If there is only one oldest date is possible use Series.idxmin for index of row with minimal value:

idx = df.loc[(df.language=='EN') & (df.doctype=='SELF'), 'date'].idxmin()
df.loc[idx,'my_list']

Upvotes: 1

Related Questions