Reputation: 10011
Given a dataframe as follows:
player score
0 Sergio Agüero Forward — Manchester City 209.98
1 Eden Hazard Midfield — Chelsea 274.04
2 Alexis Sánchez Forward — Arsenal 223.86
3 Yaya Touré Midfield — Manchester City 197.91
4 Angel María Midfield — Manchester United 132.23
How could split player
into three new columns name
, position
and team
?
player score name position team
0 Sergio Agüero Forward — Manchester City 209.98 Sergio Forward Manchester City
1 Eden Hazard Midfield — Chelsea 274.04 Eden Midfield Chelsea
2 Alexis Sánchez Forward — Arsenal 223.86 Alexis Forward Arsenal
3 Yaya Touré Midfield — Manchester City 197.91 Yaya Midfield Manchester City
4 Angel María Midfield — Manchester United 132.23 Angel Midfield Manchester United
I have considered split it two columns with df[['name_position', 'team']] = df['player'].str.split(pat= ' — ', expand=True)
, then split name_position
to name
and position
. But is there any better solutions?
Many thanks.
Upvotes: 1
Views: 1722
Reputation: 1239
You can split a python string by space with string.split()
. This will break up your text into 'words'
, then you can simply access the one you like, like this:
string = "Sergio Agüero Forward — Manchester City"
name = string.split()[0]
position = string.split()[2]
team = string.split()[4] + (string.split().has_key(5) ? string.split()[5] : '')
For more complex patterns, you can use regex, which is a powerful string pattern finding tool.
Hope this helped :)
Upvotes: 1
Reputation: 22493
You can use str.extract
as well if you want to do it in one go:
print(df["player"].str.extract(r"(?P<name>.*?)\s.*?\s(?P<position>[A-Za-z]+)\s—\s(?P<team>.*)"))
name position team
0 Sergio Forward Manchester City
1 Eden Midfield Chelsea
2 Alexis Forward Arsenal
3 Yaya Midfield Manchester City
4 Angel Midfield Manchester United
Upvotes: 2