
Reputation: 1316

Convert unicode to characters in a file using Ruby

I have this string in a code.txt file.

"class Solution {\u000Apublic:\u000A    vector\u003Cvector\u003Cint\u003E\u003E insert(vector\u003Cvector\u003Cint\u003E\u003E\u0026 intervals, vector\u003Cint\u003E\u0026 newInterval) {\u000A        int len \u003D intervals.size()\u003B\u000A        int index \u003D 0\u003B\u000A        vector\u003Cvector\u003Cint\u003E \u003E ans\u003B\u000A        \u000A\u000A        while(index \u003C len \u0026\u0026 intervals[index][1] \u003C newInterval[0]) ans.push_back(intervals[index++])\u003B\u000A        \u000A        while(index \u003C len \u0026\u0026 intervals[index][0] \u003C\u003D newInterval[1]) {\u000A            newInterval[0] \u003D min(intervals[index][0], newInterval[0])\u003B\u000A            newInterval[1] \u003D max(intervals[index][1], newInterval[1])\u003B\u000A            index++\u003B\u000A        }\u000A        \u000A        ans.push_back(newInterval)\u003B\u000A        \u000A        while(index \u003C len) ans.push_back(intervals[index++])\u003B\u000A\u000A        return ans\u003B \u000A    }\u000A}\u003B                         "

I would like to convert this string to C++ syntex and write to solution.cpp file.

The content in solution.cpp will look like..

class Solution {
    vector<vector<int>> insert(vector<vector<int>>& intervals, vector<int>& newInterval) {
        int len = intervals.size();
        int index = 0;
        vector<vector<int> > ans;

        while(index < len && intervals[index][1] < newInterval[0]) ans.push_back(intervals[index++]);
        while(index < len && intervals[index][0] <= newInterval[1]) {
            newInterval[0] = min(intervals[index][0], newInterval[0]);
            newInterval[1] = max(intervals[index][1], newInterval[1]);
        while(index < len) ans.push_back(intervals[index++]);

        return ans; 

I have tried enforcing/converting encoding to UTF-8 but the string stays the same.

code ='code.txt')
code = code.encode('UTF-8')
file ='solution.cpp', "w:UTF-8")

How can I do this? Thank you.

Upvotes: 1

Views: 340

Answers (1)


Reputation: 180

So, I have tried to reproduce your problem and got the same result as described by using your solution.

I have noticed that \u003B (for example) is a unicode code for semicolon character. So, I analyzed the string for each "U+" notation using regex /\\u(.{4})/, as it marks "hexadecimal digits" as being Unicode code points. Then used gsub! and Array#pack to convert and substitute each of the Unicode chars.

[$1.to_i(16)].pack('U') # => "\n", "\n", "<", "&", "\n", "=" ...etc.

And finally wrote the result to a file. So, my final approach looks like this:

code ='code.txt')

code.gsub!(/\\u(.{4})/) do |match|
end'solution.cpp', 'w') { |f| f.puts code.gsub!(/\A"|"\Z/, '') }

Also note, I have used gsub again at the end, to search for the leading or trailing quote and replace it with an empty string when writing to a file.

Upvotes: 3

Related Questions