Reputation: 23
I wnat to make preprocessing for Weka arff file which contains 2000 lines for nlp project (sentiment analysis)
I want a code that just add a single quotation at the start and end of each sentence. for example this is a sample for my dataset:
The Da Vinci Code is one of the most beautiful movies ive ever seen.,1
The Da Vinci Code is an * amazing * book, do not get me wrong.,1
then I turn on the light and the radio and enjoy my Da Vinci Code.,1
The Da Vinci Code was REALLY good.,1
i love da vinci code....,1
I want the output to be:
'The Da Vinci Code is one of the most beautiful movies ive ever seen.',1
'The Da Vinci Code is an * amazing * book, do not get me wrong.',1
'then I turn on the light and the radio and enjoy my Da Vinci Code.',1
'The Da Vinci Code was REALLY good.',1
'i love da vinci code....',1
Just want to add a single quotation at the beginning and end of each sentence (before the 1 ).
I would really appreciate it if you help me do it
Is there any tool that I can use instead of writing a code?
Upvotes: 0
Views: 128
Reputation: 3885
You could use Regular Expressions to achieve this. Regular expressions are a powerful formalism for pattern matching in strings. A large amount of existing tools support Regular Expressions, which allows you to match/replace the texts you want without the need to write any code yourself.
To match and replace using Regular Expressions (regexp), you need two parts:
Match:
/([^\.]+)(\.+)(,1\s+)/g
Substitution:
'$1$2'$3
You can view an interactive version of the above Match and Substitution here
Now you can use that match and substitution to work with your favorite regexp tool.
Like sed:
sed -i -E "s/([^\.]+)(\.+)(,1\s+)/'\1\2'\3/g" yourfile.txt
Or Windows PowerShell:
(Get-Content yourfile.txt) -replace '([^\.]+)(\.+)(,1\s+)', '''$1$2''$3' | Out-File output.txt
Other tools might use a different syntax. Provided match/substitution patterns can probably be optimized further.
Upvotes: 0