pandas set columns based on values of the same column

Question

I have the following dataframe:

userid       date
  1        2010-01-03
  2        2009-01-04
  3        2004-02-03
  1        2007-01-01

I want to create a column that outputs true if there is another similar userid in another row with an earlier date. i.e.

userid       date        userid_seen
  1        2010-01-03        t
  2        2009-01-04        f
  3        2004-02-03        f
  1        2007-01-01        f

How do I do it?

EdChum · Accepted Answer

This will work by calling apply and passing param axis=1 to apply it row-wise:

In [88]:

def func(x):
    if len(df.loc[(df['userid'] == x.userid) & (df['date'] != x.date), 'date']) > 0:
        return (df.loc[(df['userid'] == x.userid) & (df['date'] != x.date), 'date'] < x.date).values.max()
    return False
df['user_id_seen'] = df.apply(lambda row: func(row), axis=1)
df
Out[88]:
   userid       date user_id_seen
0       1 2010-01-03         True
1       2 2009-01-04        False
2       3 2004-02-03        False
3       1 2007-01-01        False

UPDATE

Although the above works it will be slow for large dataframes as @MattiJohn correctly points out as this effectively iterates over each row.

The following is a more compact answer similar to @MattiJohn's answer:

In [102]:

df['user_id_seen'] = df.groupby('userid')['date'].transform('min') < df.date
df
Out[102]:
   userid       date user_id_seen
0       1 2010-01-03         True
1       2 2009-01-04        False
2       3 2004-02-03        False
3       1 2007-01-01        False

pandas set columns based on values of the same column

Answers (2)

Related Questions