Laplace M
Laplace M

Reputation: 25

How to tidy up this dataset using python 2.7

I have the dataset below that I want to tidy up.

Review Title : Very poor
Upvotes : 1
Downvotes : 0
Review Content :
Hank all time this device ... fews day speakar sound not clear output
Review Title : Don't waste your money
Upvotes : 1
Downvotes : 1
Review Content :
Don't buy this product , its not good .just a waste of money.it starts showing small defects from starting few months of use and then after one year after warranty is over its mother was not working .and u can .ever fix it
  Sorry I didn't like this phone

I want to use python to shape this data into the below format.

Review Title : Very poor
Upvotes : 1
Downvotes : 0
Review Content : Hank all time this device ... fews day speakar sound not clear output

Review Title : Don't waste your money
Upvotes : 1
Downvotes : 1
Review Content : Don't buy this product , its not good .just a waste of money.it starts showing small defects from starting few months of use and then after one year after warranty is over its mother was not working .and u can .ever fix it Sorry I didn't like this phone

I want to move the text after the colon like that but I do not know how.

Upvotes: 0

Views: 60

Answers (1)

kszl
kszl

Reputation: 1213

import re

text = '''your_text_here'''

text = re.sub("Review Content :\s+", "Review Content : ", text)
text = re.sub("Review Title : ", "\n\nReview Title : ", text)
text = text.strip()

print(text)

Using re library makes it easier to operate on strings:

  • the first sub replaces the chain of whitespace characters after "Review Content" with just 1 space. Thanks to that you have content in the same line as "Review Content" label
  • the second sub adds 2 newline characters before "Review Title" labels
  • strip() removes whitespaces from the beginning and end of string, which effectively removes two newline chars that were added right before the very first "Review Title" in the previous step

Upvotes: 2

Related Questions