Oleg Tarasenko
Oleg Tarasenko

Reputation: 9610

python: Working with pandas. Getting counts

I have the data set looking this way:

<link>, <type>

For example, types can be "dofollow", "nofollow" and "javascript".

Given the fact that every link may appear many times in the dataset, I need to get result in the following way

link, dofollow, nofollow, javascript
http://somelink.com, 10 (e.g. it appeared 10 times as dofollow), 0, 101

Upvotes: 1

Views: 54

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375367

You can use a groupby size:

In [11]: df = pd.DataFrame([['a_link', 'dofollow'], ['a_link', 'dofollow'], ['a_link', 'nofollow'], ['b_link', 'javascript']], columns=['link', 'type'])

In [12]: df
Out[12]: 
     link        type
0  a_link    dofollow
1  a_link    dofollow
2  a_link    nofollow
3  b_link  javascript

In [13]: df.groupby(['link', 'type']).size()
Out[13]: 
link    type      
a_link  dofollow      2
        nofollow      1
b_link  javascript    1
dtype: int64

Now you unstack the second level (type) to make it a column and fill in the blanks:

In [14]: df.groupby(['link', 'type']).size().unstack(1)
Out[14]: 
type    dofollow  javascript  nofollow
link                                  
a_link         2         NaN         1
b_link       NaN           1       NaN

In [15]: df.groupby(['link', 'type']).size().unstack(1).fillna(0)
Out[15]: 
type    dofollow  javascript  nofollow
link                                  
a_link         2           0         1
b_link         0           1         0

Upvotes: 2

Related Questions