Reputation: 788
I have a data set like this and I want to split the name column into 2 columns where the Name column is overwritten as 'name and surname' and the 'middle name' column only contains the middle name, including the brackets.
In [1]: dd = {'Name' : ['Daniel [Jack] Horn', 'Marcus [Martin] Dwell', 'Greg [Alex] Waltz']}
In [2]: dd_frame = pd.DataFrame(dd)
In [3]: dd_frame
Out[3]:
Name
0 Daniel [Jack] Horn
1 Marcus [Martin] Dwell
2 Greg [Alex] Waltz
The expected output is
Name MiddleName
0 Daniel Horn [Jack]
1 Marcus Dwell [Martin]
2 Greg Waltz [Alex]
What would be a simple way to do this without splitting into 3 columns and merging the 1st and 3rd?
Upvotes: 2
Views: 135
Reputation: 28709
An addition to the excellent answers, using string.split :
extracts = [(f"{first}{last}", middle)
for first, middle, last in
dd_frame.Name.str.split("(\[.+\])")]
pd.DataFrame(extracts, columns=["Name", "MiddleName"])
Name MiddleName
0 Daniel Horn [Jack]
1 Marcus Dwell [Martin]
2 Greg Waltz [Alex]
Upvotes: 1
Reputation: 2111
df["Middle Name"] = df.Name.apply(lambda x: x.split(" ")[1][1:-1])
Name Middle Name
0 Daniel [Jack] Horn Jack
1 Marcus [Martin] Dwell Martin
2 Greg [Alex] Waltz Alex
By far the worst way to do what you want but it works... First it splits your name on " "
(space). Then the middle item of the list, is your name. Then we take whats in between the brackets. If you want to keep the brackets, remove the [1:-1]
df["Midname"] = df.Name.apply(lambda x: re.findall(r'\[[^\]]*\]',x)[0])
#output
Name Middle Name MidName
0 Daniel [Jack] Horn Jack [Jack]
1 Marcus [Martin] Dwell Martin [Martin]
2 Greg [Alex] Waltz Alex [Alex]
This is using regex, however, I am not an expert in Regex. findall
, gathers your answers between brackets, therefore, you have to take the first element in that list to avoid having [[Jack]]
Upvotes: 2
Reputation: 13349
Try using regex:
df = dd_frame
df['Middle Name'] = df['Name'].str.extract(r"\[(.*)\]")
df['Name'] = df['Name'].str.replace(r"\s+\[(.*)\]", "")
Name Middle Name
0 Daniel Horn Jack
1 Marcus Dwell Martin
2 Greg Waltz Alex
Upvotes: 3
Reputation: 70
Here's how you can do it!
split = dd_frame['name'].split()
dd_frame['name'] = split[0] + split[1]
dd_frame['MiddleName'] = split[1]
Upvotes: 0