user3314418
user3314418

Reputation: 3041

Getting unique rows conditioned on year pandas python dataframe

I have a dataframe of this form. However, In my final dataframe, I'd like to only get a dataframe that has unique values per year.

     Name                    Org             Year
4    New York University     doclist[1]  2004
5    Babson College          doclist[2]  2008
6    Babson College          doclist[5]  2008

So ideally, my dataframe will look like this instead

4    New York University     doclist[1]  2004
5    Babson College          doclist[2]  2008

What I've done so far. I've used groupby by year, and I seem to be able to get the unique names by year. However, I am stuck because I lose all the other information, such as the "Org" column. Advice appreciated!

#how to get unique rows per year?
q = z.groupby(['Year'])

#print q.head()
#q.reset_index(level=0, drop=True)

q.Name.apply(lambda x: np.unique(x))

For this I get the following output. How do I include the other column information as well as removing the secondary index (eg: 6, 68, 66, 72)

Year                                          
2008  6                                        Babson College
      68               European Economic And Social Committee
      66                                       European Union
      72                     Ewing Marion Kauffman Foundation

Upvotes: 0

Views: 106

Answers (1)

chrisb
chrisb

Reputation: 52256

If all you want to do is keep the first entry for each name, you can use drop_duplicates Note that this will keep the first entry based on however your data is sorted, so you may want to sort first if you want keep a specific entry.

In [98]: q.drop_duplicates(subset='Name')
Out[98]: 
                      Name         Org  Year
0      New York University  doclist[1]  2004
1           Babson College  doclist[2]  2008

Upvotes: 1

Related Questions