Reputation: 25
I have the dataset below that I want to tidy up.
Review Title : Very poor
Upvotes : 1
Downvotes : 0
Review Content :
Hank all time this device ... fews day speakar sound not clear output
Review Title : Don't waste your money
Upvotes : 1
Downvotes : 1
Review Content :
Don't buy this product , its not good .just a waste of money.it starts showing small defects from starting few months of use and then after one year after warranty is over its mother was not working .and u can .ever fix it
Sorry I didn't like this phone
I want to use python to shape this data into the below format.
Review Title : Very poor
Upvotes : 1
Downvotes : 0
Review Content : Hank all time this device ... fews day speakar sound not clear output
Review Title : Don't waste your money
Upvotes : 1
Downvotes : 1
Review Content : Don't buy this product , its not good .just a waste of money.it starts showing small defects from starting few months of use and then after one year after warranty is over its mother was not working .and u can .ever fix it Sorry I didn't like this phone
I want to move the text after the colon like that but I do not know how.
Upvotes: 0
Views: 60
Reputation: 1213
import re
text = '''your_text_here'''
text = re.sub("Review Content :\s+", "Review Content : ", text)
text = re.sub("Review Title : ", "\n\nReview Title : ", text)
text = text.strip()
print(text)
Using re library makes it easier to operate on strings:
sub
replaces the chain of whitespace characters after "Review Content" with just 1 space. Thanks to that you have content in the same line as "Review Content" labelsub
adds 2 newline characters before "Review Title" labelsstrip()
removes whitespaces from the beginning and end of string, which effectively removes two newline chars that were added right before the very first "Review Title" in the previous stepUpvotes: 2