Reputation: 1679
I have a df that looks like this:
words col_a col_b
I guess, because I have thought over that. Um, 1 0
That? yeah. 1 1
I don't always think you're up to something. 0 1
I want to split df.words wherever a punctuation character is present (.,?!:;)
into a separate row. However I want to preserve the col_b and col_b values from the original row for each new row. For example, the above df should look like this:
words col_a col_b
I guess, 1 0
because I have thought over that. 1 0
Um, 1 0
That? 1 1
yeah. 1 1
I don't always think you're up to something. 0 1
Upvotes: 4
Views: 316
Reputation: 88285
One way is using str.findall
with the pattern (.*?[.,?!:;])
to match any of these punctuation sings and the characters that preceed it (non greedy), and explode the resulting lists:
(df.assign(words=df.words.str.findall(r'(.*?[.,?!:;])'))
.explode('words')
.reset_index(drop=True))
words col_a col_b
0 I guess, 1 0
1 because I have thought over that. 1 0
2 Um, 1 0
3 That? 1 1
4 yeah. 1 1
5 I don't always think you're up to something. 0 1
Upvotes: 5