Collective Action
Collective Action

Reputation: 8009

Pandas: delete duplicate rows

I have the following df:

url='https://raw.githubusercontent.com/108michael/ms_thesis/master/crsp.dime.mpl.abbridged'

zz=pd.read_csv(url)
zz.head(30)

    date    feccandid   feccandcfscore.dyn  pacid   paccfscore  cid     catcode     type_x  di  amtsum  state   log_diff_unemployment   party   type_y  bills   years_exp   disposition     billsum
0   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
1   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
2   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
3   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
4   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
5   2006    S8NV00073   0.496   C00000422   0.330   N00006619   H1100   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     3
6   2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
7   2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
8   2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
9   2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
10  2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
11  2006    S8NV00073   0.496   C00375360   0.176   N00006619   H1100   24K     D   4500    NV  -0.024693   Republican  rep     s22-109     12  support     3
12  2006    S8NV00073   0.496   C00113803   0.269   N00006619   H1130   24K     D   2500    NV  -0.024693   Republican  rep     s22-109     12  support     2
13  2006    S8NV00073   0.496   C00113803   0.269   N00006619   H1130   24K     D   2500    NV  -0.024693   Republican  rep     s22-109     12  support     2
14  2006    S8NV00073   0.496   C00249342   0.421   N00006619   H1130   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     2
15  2006    S8NV00073   0.496   C00249342   0.421   N00006619   H1130   24K     D   5000    NV  -0.024693   Republican  rep     s22-109     12  support     2

Some of the rows are complete duplicates of each other. Is there a way to delete duplicate rows?

Upvotes: 1

Views: 175

Answers (1)

jezrael
jezrael

Reputation: 862431

I think you can use drop_duplicates:

print zz.drop_duplicates()

Upvotes: 2

Related Questions