Augusto Baldo
Augusto Baldo

Reputation: 119

How to count values of a column based on a pre-determined rule, and create new columns with that counted numbers?

The title was hard to sumarize, but I'm kind of stuck in a problem that consists in values in a table like these:

       Star        Star_planets      Planet_size    
0      star1      star1_planet1         15.2  
1      star2      star2_planet1          3.3
2      star2      star2_planet2          1.8    
3      star2      star2_planet3         13.0
4      star3      star3_planet1         10.3
5      star3      star3_planet2          6.7
6      star3      star3_planet3         12.2

What I'm trying to do is create a new table with 4 new columns that count the number of planets a star has based on the size of such planets, categorizing them. Also, the 'Star_planets' and 'Planet_size' column shouldn't be present, since they would've become obsolete. This table would look like this:

       Star        Big_planet   Med_planet   Sml_planet   Tny_planet
0      star1           1            0            0            0
1      star2           1            0            1            1
2      star3           2            1            0            0

The limits used to determine the planet sizes are these:

Big: >8
Medium: 4<Size<=8
Small: 2<Size<=4
Tiny: <=2

I've done this to try and separate the sizes in groups, but I really don't know how to proceed from here:

df = pd.read_csv("Table_planets.csv")

df['Tn'] = df.loc[df.[Planet_size] <= 2, 'Planet_size']
df['Sm'] = df.loc[(df.[Planet_size] > 2)&(df.[Planet_size] <=4), 'Planet_size']
df['Md'] = df.loc[(df.[Planet_size] > 4)&(df.[Planet_size] <=8), 'Planet_size']
df['Bg'] = df.loc[df.[Planet_size] > 8, 'Planet_size']

How can I achieve that table?

Upvotes: 1

Views: 48

Answers (3)

Quang Hoang
Quang Hoang

Reputation: 150765

You can do a cut first:

# change the labels as you wish
s = pd.cut(df.Planet_size, [-np.inf, 2, 4,8, np.inf], 
           right=True, labels=['Tiny','Small','Medium','Big'])

pd.crosstab(df['Star'], s).reset_index()

Output:

Planet_size  Tiny  Small  Medium  Big
Star                                 
star1           0      0       0    1
star2           1      1       0    1
star3           0      0       1    2

Upvotes: 1

wwnde
wwnde

Reputation: 26676

Another way

Create class using np.select

 condtions=[df['Planet_size']>8,df['Planet_size'].between(4,8),df['Planet_size'].between(2,4),df['Planet_size']<2]
 choices=['Big', 'Medium','Small','Tiny']
 df['class']=np.select(condtions, choices)

.groupby() star,.value_countS() and .unstack() while adding suffix. ReplaceNa with 0

 df.groupby('Star')['class'].value_counts().unstack('class').add_suffix('_planet').fillna(0).reset_index()

enter image description here

Upvotes: 0

jfaccioni
jfaccioni

Reputation: 7519

First let's add a categorical column:

def planet_category(planet_size): 
    if planet_size > 8: 
        return 'Big' 
    elif planet_size > 4: 
        return 'Medium' 
    elif planet_size > 2: 
        return 'Small' 
    return 'Tiny'

df['Planet_category'] = df['Planet_size'].apply(planet_category)
df['Planet_category']

output:

0       Big
1     Small
2      Tiny
3       Big
4       Big
5    Medium
6       Big
Name: Planet_category, dtype: object

Then we can one-hot encode the category with pd.get_dummies, group by the Star column and sum the resulting values with:

pd.get_dummies(df, columns=['Planet_category']).groupby('Star').sum()

output:

       Planet_size  Planet_category_Big  Planet_category_Medium  Planet_category_Small  Planet_category_Tiny
Star                                                                                                        
star1         15.2                    1                       0                      0                     0
star2         18.1                    1                       0                      1                     1
star3         29.2                    2                       1                      0                     0

Note that this turns the Star column into an index. Add a call to reset_index() if you want Star back as a column:

pd.get_dummies(df, columns=['Planet_category']).groupby('Star').sum().reset_index()

output:

    Star  Planet_size  Planet_category_Big  Planet_category_Medium  Planet_category_Small  Planet_category_Tiny
0  star1         15.2                    1                       0                      0                     0
1  star2         18.1                    1                       0                      1                     1
2  star3         29.2                    2                       1                      0                     0

Upvotes: 1

Related Questions