Reputation: 2333
I have a pandas dataframe with the following structure:
author Year co_author
A 1990 B
A 1990 C
A 1991 B
A 1994 D
A 1995 D
B 1990 A
B 1991 C
B 1991 E
B 1998 C
I would like to list the co-authors that each author has ever worked with in a 3-year window. So for the above, the result should be as below:
author 3-Year-window co_authors_list
A 1990-1992 [B,C]
A 1991-1993 [B,C]
A 1992-1994 [D]
A 1994-1996 [D]
A 1995-1997 [D]
B 1990-1992 [A,C,E]
B 1991-1993 [C,E]
B 1998-2000 [C]
I know how to group it with a one year window, but not three. This is the code for one year window:
df.groupby(['author','Year'])['co_author'].apply(list)
Upvotes: 3
Views: 64
Reputation: 323376
I am using numpy
board cast with groupby
then re-create the dataframe
l=[]
for x, y in df.groupby('author'):
s = y.Year.values
a = s - s[:, None]
l.append([y.co_author[x].unique() for x in (np.logical_and(a >= 0, a <= 2))])
df=pd.DataFrame({'author':df.author,
'Year':df.Year.astype(str)+'-'+(df.Year+2).astype(str),
'co_authors_list':np.concatenate(l)}).\
drop_duplicates(['author','Year'])
df
Out[337]:
author Year co_authors_list
0 A 1990-1992 [B, C]
2 A 1991-1993 [B]
3 A 1994-1996 [D]
4 A 1995-1997 [D]
5 B 1990-1992 [A, C, E]
6 B 1991-1993 [C, E]
8 B 1998-2000 [C]
Upvotes: 2