Reputation: 73
I am trying to split the sentence into two columns (Review and Sentiment).
Let us assume that we have a sentence:
Hi... I earn 7 dot 50 per hour i.e $7.50/hr. Positive
Here, "Positive" is the Sentiment and the former is the Review.
i) I cannot use \s as delimiter to split the sentence into two columns(Review,Sentiment) ii) If I use '.' as delimiter then multiple occurrences of '.' is present in the sentence.
I have written a code to remove the multiple occurrences of '.' and the code is as below:
def clean(sentence):
clear = re.sub(r"[,|\"|\"|\'|\'|\-|!|?|\/|*|:|\\|\(|\)|;|$]",'', sentence)
clear1 = re.sub(r'(\W)\1+',' ', clear)
[' '.join(clear1.split())]
which is able to remove "..." after the word "hi" but fails for "i.e" and "$7.50".
My desired result is:
Review: Hi I earn 7 dot 50 per hour i e 7 50 hr
Sentiment: Positive
My output is:
Hi I earn 7 dot 50 per hour i.e 7.50 hr.
PS: I am using pandas to load it as a dataframe of two columns
Edit1: My sentiment contains either "Positive" or "Negative" in my case.
Edit2: I am storing this output as a csv file and I am reading using pandas(read_csv())
Upvotes: 1
Views: 83
Reputation: 89557
Find all groups of word characters and use the lists:
>>> import re
>>> l = re.findall(r'\w+', s)
>>> ' '.join(l[:-1])
'Hi I earn 7 dot 50 per hour i e 7 50 hr'
>>> l[-1]
'Positive'
Upvotes: 1
Reputation: 108
In your case, as you know that the sentiment will always be "Positive" or "Negative" you can get your 2 columns like this :
m = re.match(r"(?P<review>.*)\. (?P<sentiment>Positive|Negative)$", sentence)
m.group('review')
m.group('sentiment')
Upvotes: 0
Reputation: 91430
How about re.split
?
This will split on space only if it is followed by Positive
or Negative
import re
sentence = 'Hi... I earn 7 dot 50 per hour i.e $7.50/hr. Positive'
res = re.split(r'\s+(?=Positive|Negative)', sentence)
print(res)
Output:
['Hi... I earn 7 dot 50 per hour i.e $7.50/hr.', 'Positive']
Upvotes: 0
Reputation: 199
If you just need the last occurrence of dot sign, you can use this regex:
\.(?!.*\.)
Example: https://regex101.com/r/OYkupF/2
Upvotes: 1
Reputation: 917
If sentiment is only 'Positive' or 'Negative'. Then,
def clean(sentence):
tokens = sentence.split()
return " ".join(tokens[:-1]), tokens[-1]
which will give a tuple,
('Hi... I earn 7 dot 50 per hour i.e $7.50/hr.', 'Positive')
Upvotes: 0