Vy Do
Vy Do

Reputation: 52516

Error encoding when parsing response from REST request

Context: I use Selenium for parsing content at http://bizhub.vn/tech/start-up-cvn-loyalty-receives-vnd11-billion-investment_318377.html , then get content of news post, then summary to 4 sentences use API from meaningcloud.com (text summarization services).

I am using .NET 5.0.100-preview.8.20417.9 , ASP.NET Core Web API 5, I have

File SysUtil.cs

using System;

namespace bizhubdotvn_tech.Controllers
{
    public class SysUtil
    {
        public static String StringEncodingConvert(String strText, String strSrcEncoding, String strDestEncoding)
        {
            System.Text.Encoding srcEnc = System.Text.Encoding.GetEncoding(strSrcEncoding);
            System.Text.Encoding destEnc = System.Text.Encoding.GetEncoding(strDestEncoding);
            byte[] bData = srcEnc.GetBytes(strText);
            byte[] bResult = System.Text.Encoding.Convert(srcEnc, destEnc, bData);
            return destEnc.GetString(bResult);
        }
    }
}
        public static string SummaryText(string newsContent)
        {
            var client = new RestClient("https://api.meaningcloud.com/summarization-1.0");
            client.Timeout = -1;
            var request = new RestRequest(Method.POST);
            request.AddParameter("key", "25870359b682ec3c93f9becd850eb442");
            request.AddParameter("txt", JsonEncodedText.Encode(newsContent));
            request.AddParameter("sentences", 4);
            IRestResponse response = client.Execute(request);
            var mm = JObject.Parse(response.Content);
            string raw_string = (string)mm["summary"];
            //FIXME: sinh ra các ký tự lạ.
            string foo2 = SysUtil.StringEncodingConvert(raw_string, "Windows-1251", "UTF-8");
            Console.WriteLine("summary4 = " + foo2);
            return foo2;
        }

response.Content =

"{\"status\":{\"code\":\"0\",\"msg\":\"OK\",\"credits\":\"1\",\"remaining_credits\":\"19756\"},\"summary\":\"NextPay Joint Stock Company on Monday announced it had invested VND11 billion (US$473,000) in CNV Loyalty.Established at the end of 2017, CNV Loyalty creates customer care applications for businesses.Nguyen Tuan Phu, founder cum CEO of CNV Loyalty said: \\\\u201CWith only the cost of VND50 - 150 million, significantly lower than building a customer care system in a traditional way, Loyalty designs a customised application for the business with its own brand to interact directly with customers. In particular, the rate of accessing customers accurately up to 95 per cent.\\\\u201DThe start-up has received an investment of VND11 billion from NextPay and Next100 - investment fund from Nguyen Hoa Binh after two months of an appraisal.Nguyen Huu Tuat, CEO of NextPay said: \\\\u201CWe are living in a digital economy whose main form is the app economy. [...] CNV Loyalty is a solution to help brands always present directly, interact directly, understand their customers directly at a zero cost. [...] The investment of NextPay will help CNV Loyalty to invest more deeply in technology and products.With the market heavily influenced by COVID-19, Vietnamese start-ups still developing and receiving investment from domestic investors without being dependent on venture capital abroad is a great encouragement to Viet Nam\\\\u0027s start-up community.\"}"

foo2 =

"NextPay Joint Stock Company on Monday announced it had invested VND11 billion (US$473,000) in CNV Loyalty.Established at the end of 2017, CNV Loyalty creates customer care applications for businesses.Nguyen Tuan Phu, founder cum CEO of CNV Loyalty said: \\u201CWith only the cost of VND50 - 150 million, significantly lower than building a customer care system in a traditional way, Loyalty designs a customised application for the business with its own brand to interact directly with customers. In particular, the rate of accessing customers accurately up to 95 per cent.\\u201DThe start-up has received an investment of VND11 billion from NextPay and Next100 - investment fund from Nguyen Hoa Binh after two months of an appraisal.Nguyen Huu Tuat, CEO of NextPay said: \\u201CWe are living in a digital economy whose main form is the app economy. [...] CNV Loyalty is a solution to help brands always present directly, interact directly, understand their customers directly at a zero cost. [...] The investment of NextPay will help CNV Loyalty to invest more deeply in technology and products.With the market heavily influenced by COVID-19, Vietnamese start-ups still developing and receiving investment from domestic investors without being dependent on venture capital abroad is a great encouragement to Viet Nam\\u0027s start-up community."

enter image description here

How to resolve problem at \\u201C (globally, consider this type of character, not only one specific character)?

Upvotes: 0

Views: 543

Answers (3)

Vy Do
Vy Do

Reputation: 52516

Base on the answer of Andy

private static readonly HttpClient _httpClient = new HttpClient();

private sealed class MeaningResponseModel
{
    [JsonProperty("summary")]
    public string Summary { get; set; }
}

private static async Task<MeaningResponseModel> GetMeaningfulDataAsync(string key, int sentences, string content)
{
    var queryString = $"key={key}&sentences={sentences}&txt={content}";
    using (var req = new HttpRequestMessage(HttpMethod.Post, new UriBuilder("https://api.meaningcloud.com/summarization-1.0") { Query = queryString }.Uri))
    {
        using (var res = await _httpClient.SendAsync(req))
        {
            res.EnsureSuccessStatusCode();
            using (var s = await res.Content.ReadAsStreamAsync())
            using (var sr = new StreamReader(s))
            using (var jtr = new JsonTextReader(sr))
            {
                return new Newtonsoft.Json.JsonSerializer().Deserialize<MeaningResponseModel>(jtr);
            }
        }
    }
}

when using:

// Lấy summary 4 câu.
var temp = await GetMeaningfulDataAsync("25870359b682ec3c93f9becd850eb442", 4, contentAfterTrim);
string summary4 = temp.Summary;
summary4 = summary4.Replace("[...] ", "");
news.Summary4 = summary4;
Console.WriteLine("summary4 = " + summary4);

Upvotes: 1

Andy
Andy

Reputation: 13547

So this may be the post-processing you are doing on the object returned. You shouldn't have to do any of that. This could also be the RestClient you are using. There is no reason to use RestClient. You can use HttpClient to do everything and anything you'd ever need with the HTTP prototcol.

I went and signed up for a key and tried it and it worked just fine as is without having to "re-encode" the data. Here is my implementation:

private static readonly HttpClient _httpClient = new HttpClient();

private sealed class MeaningResponseModel
{
    [JsonProperty("summary")]
    public string Summary { get; set; }
}

private static async Task<MeaningResponseModel> GetMeaningfulDataAsync(
    string key, int sentences, Uri uri)
{
    var queryString = $"key={key}&sentences={sentences}" +
        $"&url={WebUtility.UrlEncode(uri.ToString())}";

    using (var req = new HttpRequestMessage(HttpMethod.Post,
        new UriBuilder("https://api.meaningcloud.com/summarization-1.0")
    {
        Query = queryString
    }.Uri))
    {
        using (var res = await _httpClient.SendAsync(req))
        {
            res.EnsureSuccessStatusCode();
            using(var s = await res.Content.ReadAsStreamAsync())
            using(var sr = new StreamReader(s))
            using(var jtr = new JsonTextReader(sr))
            {
                return new JsonSerializer().Deserialize<MeaningResponseModel>(jtr);
            }
        }
    }
}

private async Task TestThis()
{
    var test = await GetMeaningfulDataAsync(
        "YOUR KEY HERE",
        20,
        new Uri("http://bizhub.vn/tech/start-up-cvn-loyalty-receives-vnd11-billion-investment_318377.html"));

    Console.WriteLine(test.Summary);
}

The output:

working result

Upvotes: 2

Chandra Prakash Ajmera
Chandra Prakash Ajmera

Reputation: 304

I believe you want to replace the special characters like

\u201c

Nick van Esch posted an answer about the same in this thread which might help you.

Upvotes: 0

Related Questions