Reputation: 228
I currently have a column which has data I want to parse, and then put this data on other columns. Currently the best I can get is from using the apply method:
def parse_parent_names(row):
split = row.person_with_parent_names.split('|')[2:-1]
return split
df['parsed'] = train_data.apply(parse_parent_names, axis=1).head()
The data is a panda df with a column that has names separated by a pipe (|):
'person_with_parent_names'
|John|Doe|Bobba|
|Fett|Bobba|
|Abe|Bea|Cosby|
Being the rightmost one the person and the leftmost the "grandest parent". I'd like to transform this to three columns, like:
'grandfather' 'father' 'person'
John Doe Bobba
Fett Bobba
Abe Bea Cosby
But with apply, the best I can get is
'parsed'
[John, Doe,Bobba]
[Fett, Bobba]
[Abe, Bea, Cosby]
I could use apply three times, but it would not be efficient to read the entire dataset three times.
Upvotes: 1
Views: 31
Reputation: 863301
Your function should be changed by compare number of |
and split by ternary operator, last pass to DataFrame
constructor:
def parse_parent_names(row):
m = row.count('|') == 4
split = row.split('|')[1:-1] if m else row.split('|')[:-1]
return split
cols = ['grandfather','father','person']
df1 = pd.DataFrame([parse_parent_names(x) for x in df.person_with_parent_names],
columns=cols)
print (df1)
grandfather father person
0 John Doe Bobba
1 Fett Bobba
2 Abe Bea Cosby
Upvotes: 1