Reputation: 119
The title was hard to sumarize, but I'm kind of stuck in a problem that consists in values in a table like these:
Star Star_planets Planet_size
0 star1 star1_planet1 15.2
1 star2 star2_planet1 3.3
2 star2 star2_planet2 1.8
3 star2 star2_planet3 13.0
4 star3 star3_planet1 10.3
5 star3 star3_planet2 6.7
6 star3 star3_planet3 12.2
What I'm trying to do is create a new table with 4 new columns that count the number of planets a star has based on the size of such planets, categorizing them. Also, the 'Star_planets' and 'Planet_size' column shouldn't be present, since they would've become obsolete. This table would look like this:
Star Big_planet Med_planet Sml_planet Tny_planet
0 star1 1 0 0 0
1 star2 1 0 1 1
2 star3 2 1 0 0
The limits used to determine the planet sizes are these:
Big: >8
Medium: 4<Size<=8
Small: 2<Size<=4
Tiny: <=2
I've done this to try and separate the sizes in groups, but I really don't know how to proceed from here:
df = pd.read_csv("Table_planets.csv")
df['Tn'] = df.loc[df.[Planet_size] <= 2, 'Planet_size']
df['Sm'] = df.loc[(df.[Planet_size] > 2)&(df.[Planet_size] <=4), 'Planet_size']
df['Md'] = df.loc[(df.[Planet_size] > 4)&(df.[Planet_size] <=8), 'Planet_size']
df['Bg'] = df.loc[df.[Planet_size] > 8, 'Planet_size']
How can I achieve that table?
Upvotes: 1
Views: 48
Reputation: 150765
You can do a cut first:
# change the labels as you wish
s = pd.cut(df.Planet_size, [-np.inf, 2, 4,8, np.inf],
right=True, labels=['Tiny','Small','Medium','Big'])
pd.crosstab(df['Star'], s).reset_index()
Output:
Planet_size Tiny Small Medium Big
Star
star1 0 0 0 1
star2 1 1 0 1
star3 0 0 1 2
Upvotes: 1
Reputation: 26676
Another way
Create class using np.select
condtions=[df['Planet_size']>8,df['Planet_size'].between(4,8),df['Planet_size'].between(2,4),df['Planet_size']<2]
choices=['Big', 'Medium','Small','Tiny']
df['class']=np.select(condtions, choices)
.groupby()
star,.value_countS()
and .unstack()
while adding suffix
. ReplaceNa
with 0
df.groupby('Star')['class'].value_counts().unstack('class').add_suffix('_planet').fillna(0).reset_index()
Upvotes: 0
Reputation: 7519
First let's add a categorical column:
def planet_category(planet_size):
if planet_size > 8:
return 'Big'
elif planet_size > 4:
return 'Medium'
elif planet_size > 2:
return 'Small'
return 'Tiny'
df['Planet_category'] = df['Planet_size'].apply(planet_category)
df['Planet_category']
output:
0 Big
1 Small
2 Tiny
3 Big
4 Big
5 Medium
6 Big
Name: Planet_category, dtype: object
Then we can one-hot encode the category with pd.get_dummies
, group by the Star
column and sum the resulting values with:
pd.get_dummies(df, columns=['Planet_category']).groupby('Star').sum()
output:
Planet_size Planet_category_Big Planet_category_Medium Planet_category_Small Planet_category_Tiny
Star
star1 15.2 1 0 0 0
star2 18.1 1 0 1 1
star3 29.2 2 1 0 0
Note that this turns the Star
column into an index. Add a call to reset_index()
if you want Star
back as a column:
pd.get_dummies(df, columns=['Planet_category']).groupby('Star').sum().reset_index()
output:
Star Planet_size Planet_category_Big Planet_category_Medium Planet_category_Small Planet_category_Tiny
0 star1 15.2 1 0 0 0
1 star2 18.1 1 0 1 1
2 star3 29.2 2 1 0 0
Upvotes: 1