Reputation: 317
I am working on the Olympics dataset related to this
This is what the dataframe looks like:
Unnamed: 0 # Summer 01 ! 02 ! 03 ! Total # Winter \
0 Afghanistan (AFG) 13 0 0 2 2 0
1 Algeria (ALG) 12 5 2 8 15 3
2 Argentina (ARG) 23 18 24 28 70 18
3 Armenia (ARM) 5 1 2 9 12 6
4 Australasia (ANZ) [ANZ] 2 3 4 5 12 0
I want to do the following things:
For example the updated column should be:
Unnamed: 0 # Summer 01 ! 02 ! 03 ! Total # Winter \
0 Afghanistan 13 0 0 2 2 0
1 Algeria 12 5 2 8 15 3
2 Argentina 23 18 24 28 70 18
3 Armenia 5 1 2 9 12 6
4 Australasia 2 3 4 5 12 0
Please show me a proper way to achieve this.
Upvotes: 2
Views: 1559
Reputation: 7504
Splitting to get two columns, country
and Country Code
and setting country as index:
df2 = pd.DataFrame(df.Unnamed.str.split(' ',1).tolist(), columns = ['Country', 'countryCode']).set_index('Country')
You could also add country code as an additional info in your dataframe.
Removing the extra thing, as I suppose like: [ANZ]
, using regex (as mentioned in other answer)
df2 = df2.replace('\[.*?\]','', regex=True)
Upvotes: 1
Reputation: 30605
You can use regex and replace to that i.e
df = df.replace('\(.+?\)|\[.+?\]\s*','',regex=True).rename(columns={'Unnamed: 0':'Country'}).set_index('Country')
Output:
Summer 01 ! 02 ! 03 ! Total Winter Country Afghanistan 13 0 0 2 2 0 Algeria 12 5 2 8 15 3 Argentina 23 18 24 28 70 18 Armenia 5 1 2 9 12 6 Australasia 2 3 4 5 12 0
If you dont want to rename then .set_index('Unnamed: 0')
Or Thanks @Scott a much easier solution is to split by (
and select the first element i.e
df['Unnamed: 0'] = df['Unnamed: 0'].str.split('\(').str[0]
Upvotes: 3