Akhilesh Chobey
Akhilesh Chobey

Reputation: 317

Set index in pandas

I am working on the Olympics dataset related to this

This is what the dataframe looks like:

                Unnamed: 0  # Summer  01 !  02 !  03 !  Total  # Winter  \
0        Afghanistan (AFG)        13     0     0     2      2         0   
1            Algeria (ALG)        12     5     2     8     15         3   
2          Argentina (ARG)        23    18    24    28     70        18   
3            Armenia (ARM)         5     1     2     9     12         6   
4  Australasia (ANZ) [ANZ]         2     3     4     5     12         0 

I want to do the following things:

For example the updated column should be:

                    Unnamed: 0  # Summer  01 !  02 !  03 !  Total  # Winter  \
0        Afghanistan         13     0     0     2      2         0   
1            Algeria         12     5     2     8     15         3   
2          Argentina         23    18    24    28     70        18   
3            Armenia          5     1     2     9     12         6   
4         Australasia         2     3     4     5     12         0 

Please show me a proper way to achieve this.

Upvotes: 2

Views: 1559

Answers (2)

bhansa
bhansa

Reputation: 7504

Splitting to get two columns, country and Country Code and setting country as index:

df2 = pd.DataFrame(df.Unnamed.str.split(' ',1).tolist(), columns = ['Country', 'countryCode']).set_index('Country')

You could also add country code as an additional info in your dataframe.

Removing the extra thing, as I suppose like:   [ANZ], using regex (as mentioned in other answer)

df2 = df2.replace('\[.*?\]','', regex=True)

Upvotes: 1

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

You can use regex and replace to that i.e

df = df.replace('\(.+?\)|\[.+?\]\s*','',regex=True).rename(columns={'Unnamed: 0':'Country'}).set_index('Country')

Output:

               Summer  01 !  02 !  03 !  Total  Winter
Country                                               
Afghanistan        13     0     0     2      2       0
Algeria            12     5     2     8     15       3
Argentina          23    18    24    28     70      18
Armenia             5     1     2     9     12       6
Australasia         2     3     4     5     12       0

If you dont want to rename then .set_index('Unnamed: 0')

Or Thanks @Scott a much easier solution is to split by ( and select the first element i.e

df['Unnamed: 0'] = df['Unnamed: 0'].str.split('\(').str[0] 

Upvotes: 3

Related Questions