How to tidy up this dataset using python 2.7

Question

I have the dataset below that I want to tidy up.

Review Title : Very poor
Upvotes : 1
Downvotes : 0
Review Content :
Hank all time this device ... fews day speakar sound not clear output
Review Title : Don't waste your money
Upvotes : 1
Downvotes : 1
Review Content :
Don't buy this product , its not good .just a waste of money.it starts showing small defects from starting few months of use and then after one year after warranty is over its mother was not working .and u can .ever fix it
  Sorry I didn't like this phone

I want to use python to shape this data into the below format.

Review Title : Very poor
Upvotes : 1
Downvotes : 0
Review Content : Hank all time this device ... fews day speakar sound not clear output

Review Title : Don't waste your money
Upvotes : 1
Downvotes : 1
Review Content : Don't buy this product , its not good .just a waste of money.it starts showing small defects from starting few months of use and then after one year after warranty is over its mother was not working .and u can .ever fix it Sorry I didn't like this phone

I want to move the text after the colon like that but I do not know how.

kszl · Accepted Answer

import re

text = '''your_text_here'''

text = re.sub("Review Content :\s+", "Review Content : ", text)
text = re.sub("Review Title : ", "

Review Title : ", text)
text = text.strip()

print(text)

Using re library makes it easier to operate on strings:

the first sub replaces the chain of whitespace characters after "Review Content" with just 1 space. Thanks to that you have content in the same line as "Review Content" label
the second sub adds 2 newline characters before "Review Title" labels
strip() removes whitespaces from the beginning and end of string, which effectively removes two newline chars that were added right before the very first "Review Title" in the previous step

How to tidy up this dataset using python 2.7

Answers (1)

Related Questions