Reputation: 1679
I have a dataframe where every row is a word or punctuation. I want to iterate through the dataframe and whenever a row contains punctuation, I want to combine it with the previous row.
For example, I want to convert:
word
0 hello
1 ,
2 how
3 are
4 you
5 ?
Into:
word
0 hello,
2 how
3 are
4 you?
Thanks.
Upvotes: 4
Views: 901
Reputation: 13413
yet another approach, concatenating to previous row using .shift(-1)
:
df.loc[df["word"].shift(-1).isin(list(punctuation)), "word"] = df["word"] + df["word"].shift(-1)
df = df[~df["word"].isin(list(punctuation))][["word"]]
df:
word
0 hello,
2 how
3 are
4 you?
Upvotes: 0
Reputation: 150825
You can use isin
and cumsum
:
# list of puctuations
punctuations = set([',','?'])
# blocks
blocks = ~df['word'].isin(punctuations)).cumsum()
# groupby
df['word'].groupby(blocks).sum()
Output:
word
1 hello,
2 how
3 are
4 you?
Name: word, dtype: object
Upvotes: 0
Reputation: 294586
match
and cumsum
df.groupby((~df.word.str.match('\W')).cumsum(), as_index=False).sum()
word
0 hello,
1 how
2 are
3 you?
isin
Also, without the as_index=True
from string import punctuation
df.groupby((~df.word.isin(list(punctuation))).cumsum()).sum()
word
word
1 hello,
2 how
3 are
4 you?
Upvotes: 5