Prometheus
Prometheus

Reputation: 2017

Encoding issue with data pulled from facebook api

I'm pulling data from a public page through the rfacebook package. My code is below:

fb_oauth <- fbOAuth(app_id="###", app_secret="###", extended_permissions = FALSE,
                    legacy_permissions = FALSE) #id/secret is hidden


telekom <- getPage(page="TelekomMK", token=fb_oauth)

t_head <- head(telekom, n = 1)

This is the data I'm getting:

    dput(t_head)
structure(list(from_id = "86689960994", from_name = "Telekom MK", 
    message = "<U+0421><U+043C><U+0430><U+0440><U+0442><U+0444><U+043E><U+043D> <U+0437><U+0430> <U+0441><U+0430><U+043C><U+043E> 1 <U+0434><U+0435><U+043D><U+0430><U+0440>! <U+041E><U+0434><U+0431><U+0435><U+0440><U+0435><U+0442><U+0435> <U+0433><U+043E> Huawei P Smart <U+0432><U+043E> Magenta 1L <U+0438> <U+0434><U+043E><U+0431><U+0438><U+0458><U+0442><U+0435> <U+0434><U+0432><U+043E><U+0458><U+043D><U+043E> <U+043F><U+043E><U+0432><U+0435><U+045C><U+0435> <U+043C><U+043E><U+0431><U+0438><U+043B><U+0435><U+043D> <U+0438><U+043D><U+0442><U+0435><U+0440><U+043D><U+0435><U+0442>. <ed><U+00A0><U+00BD><ed><U+00B3><U+00B2>", 
    created_time = "2018-03-09T12:00:00+0000", type = "photo", 
    link = "https://www.facebook.com/TelekomMK/photos/a.90789160994.98029.86689960994/10156117256280995/?type=3", 
    id = "86689960994_10156117256945995", story = NA_character_, 
    likes_count = 6, comments_count = 0, shares_count = 0), .Names = c("from_id", 
"from_name", "message", "created_time", "type", "link", "id", 
"story", "likes_count", "comments_count", "shares_count"), row.names = 1L, class = "data.frame")

What I don't understand is.. why is the text thats written on cyrlic returned as these unreadable characters? Is there a way I can fix this?

Many thanks

Upvotes: 0

Views: 60

Answers (1)

Robert Chestnutt
Robert Chestnutt

Reputation: 322

I just extracted the posts using

TelekomMK_posts <- getPage("TelekomMK", token = fboauth, 
                          n=10000, since = '2009/01/01', 
                          until = '2018/03/15')

It came out fine Are you only interested in the particular post from March 9th? I use FB to extract Cyrillic a lot - pm me if you have questions Try encoding to UTF-8

Upvotes: 0

Related Questions