Reputation: 21
I am making a web application which reads data from an Excel file. The data set I have has a columns which has data of categories and sub-categories of books such as 'Fiction.Romantic', 'Fiction.Thriller', 'Sports.Imaginative', 'Sports.AutoBiographic' etc.
I want the pandas to filter out data and print the total/combined values as 'Sports', 'Fiction' etc.
Is there a way I could remove the subcategories ('.Imaginative' , '.Thriller') and print/display as desired?
There are a lot of categories and subcategories so doing them manually will take a lot of time.
Actual result:
Name Category Amount
ABC Sports.Autobiographic. 7485
BCD. Sports.Imaginative. 20
Expected result:
Name Category Amount
ABC Sports 7485
BCD. Sports 20
Upvotes: 2
Views: 936
Reputation: 862601
Use Series.str.split
by first .
only by n=1
and select firts lists by indexing:
df['Category'] = df['Category'].str.split('.', n=1).str[0]
print (df)
Name Category Amount
0 ABC Sports 7485
1 BCD. Sports 20
Or use Series.str.extract
:
df['Category'] = df['Category'].str.extract(r'([a-zA-Z]+)\.')
print (df)
Name Category Amount
0 ABC Sports 7485
1 BCD. Sports 20
Upvotes: 2