Reputation: 199
Let's say I have that dataframe df
:
index col1 col2
1 48 alpha bravo charlie
2 52 alpha bravo
3 49 alpha bravo charlie delta echo
4 12 alpha bravo
5 6 alpha
What I want is to delete the first word in col2
when there is more than 2 words in the cell.
So it should look like this:
index col1 col2
1 48 bravo charlie
2 52 alpha bravo
3 49 bravo charlie delta echo
4 12 alpha bravo
5 6 alpha
I have coded the line to df['col2'] = df['col2'].apply(lambda x: ' '.join(x.split(' ')[1:]))
but I don't know how to apply the condition into my dataframe.
Upvotes: 3
Views: 177
Reputation: 369054
Using regular expression re.Pattern.sub
:
>>> import re
>>> pattern = re.compile(r'^\S+ (?=\S+ )')
>>> pattern.sub('', 'bravo charlie delta echo')
'charlie delta echo'
>>> pattern.sub('', 'alpha')
'alpha'
>>> import re
>>> from functools import partial
>>> df['col2'] = df['col2'].apply(partial(pattern.sub, ''))
>>> df
col1 col2
0 48 bravo charlie
1 52 alpha bravo
2 49 bravo charlie delta echo
3 12 alpha bravo
4 6 alpha
Upvotes: 1
Reputation: 862591
Add if-else
statement with count spaces:
df['col2'] = df['col2'].apply(lambda x: ' '.join(x.split()[1:]) if x.count(' ') > 1 else x)
Or:
df['col2'] = df['col2'].apply(lambda x: x.split(maxsplit=1)[1] if x.count(' ') > 1 else x)
print (df)
index col1 col2
0 1 48 bravo charlie
1 2 52 alpha bravo
2 3 49 bravo charlie delta echo
3 4 12 alpha bravo
4 5 6 alpha
Pandas alternative:
df['col2']=df['col2'].mask(df['col2'].str.count(' ') > 1, df['col2'].str.split(n=1).str[1])
print (df)
index col1 col2
0 1 48 bravo charlie
1 2 52 alpha bravo
2 3 49 bravo charlie delta echo
3 4 12 alpha bravo
4 5 6 alpha
Upvotes: 6