BKS
BKS

Reputation: 2333

How to groupby with a condition in pandas

I have a pandas dataframe with the following structure:

author    Year   co_author
A         1990   B
A         1990   C
A         1991   B
A         1994   D
A         1995   D
B         1990   A
B         1991   C
B         1991   E
B         1998   C

I would like to list the co-authors that each author has ever worked with in a 3-year window. So for the above, the result should be as below:

author    3-Year-window   co_authors_list
A         1990-1992       [B,C]
A         1991-1993       [B,C]     
A         1992-1994       [D]            
A         1994-1996       [D]
A         1995-1997       [D]
B         1990-1992       [A,C,E]
B         1991-1993       [C,E]
B         1998-2000       [C]

I know how to group it with a one year window, but not three. This is the code for one year window:

df.groupby(['author','Year'])['co_author'].apply(list)

Upvotes: 3

Views: 64

Answers (1)

BENY
BENY

Reputation: 323376

I am using numpy board cast with groupby then re-create the dataframe

l=[]
for x, y in df.groupby('author'):
    s = y.Year.values
    a = s - s[:, None]
    l.append([y.co_author[x].unique() for x in (np.logical_and(a >= 0, a <= 2))])
df=pd.DataFrame({'author':df.author,
                 'Year':df.Year.astype(str)+'-'+(df.Year+2).astype(str),
                  'co_authors_list':np.concatenate(l)}).\
          drop_duplicates(['author','Year'])
df
Out[337]: 
  author       Year co_authors_list
0      A  1990-1992          [B, C]
2      A  1991-1993             [B]
3      A  1994-1996             [D]
4      A  1995-1997             [D]
5      B  1990-1992       [A, C, E]
6      B  1991-1993          [C, E]
8      B  1998-2000             [C]

Upvotes: 2

Related Questions