user1050619
user1050619

Reputation: 20886

Elasticsearch for normalized data

I have some data from facebook api's...

I have a FB page and thats part of multiple country...

For example:-

Assume company-x operated in multiple countries - USA, UK, India, China

Now, a page can be posted on multiple country pages.

For example:- Company-x new innovation will be displayed in all the 4 country pages...

Each of the pages will get its over comments, likes...etc...

So, basically its a relational data.

Company(1) - Country(n)- Post(n) - LIkes(n) - Comments(n)...

I would like to know what would be the best way to store this data in elastic search and implement the search engine..

Upvotes: 0

Views: 2745

Answers (1)

Tobi
Tobi

Reputation: 31479

As you can't use "classic" (relational) JOINs in Elasticsearch, IMHO you only can choose between storing the (sub-)objects as flat objects, parent-child objects or nested objects in the index/type.

I think that you should consider the first two option. I personally would opt for flat objects, as they are easier to load, and also get returned from the FB Graph API in that way ("flat"). What you would have to add in you application is the mapping of the page to Company -> Country, because FB doesn't know about that.

See

As a query for the posts, you could use something like

/?ids={page1_id},{page2_id},{page3_id}&fields=id,posts.fields(id,message,created_time,link,picture,place,status_type,shares,likes.summary(true).limit(0),comments.summary(true).limit(0))

which will return something like

{
  "id": "7419689078",
  "posts": {
    "data": [
      {
        "id": "7419689078_10153348181604079",
        "message": "Gotta find them all in the real world soon.",
        "created_time": "2015-09-10T06:40:12+0000",
        "link": "http://venturebeat.com/2015/09/09/nintendo-takes-pokemon-into-mobile-gaming-in-partnership-with-google-niantic/",
        "picture": "https://fbexternal-a.akamaihd.net/safe_image.php?d=AQDvvzpCAM1WkJZS&w=130&h=130&url=http%3A%2F%2Fi0.wp.com%2Fventurebeat.com%2Fwp-content%2Fuploads%2F2013%2F04%2Fpokemon_mystery_dungeon_gates_to_infinity_art.jpg%3Ffit%3D780%252C9999&cfs=1",
        "status_type": "shared_story",
        "likes": {
          "data": [
          ],
          "summary": {
            "total_count": 0,
            "can_like": true,
            "has_liked": false
          }
        },
        "comments": {
          "data": [
          ],
          "summary": {
            "order": "ranked",
            "total_count": 0,
            "can_comment": true
          }
        }
      }
    ],
    "paging": {
      "previous": "https://graph.facebook.com/v2.4/7419689078/posts?fields=id,message,created_time,link,picture,place,status_type,shares,likes.summary%28true%29.limit%280%29,comments.summary%28true%29.limit%280%29&limit=1&since=1441867212&access_token=&__paging_token=&__previous=1",
      "next": "https://graph.facebook.com/v2.4/7419689078/posts?fields=id,message,created_time,link,picture,place,status_type,shares,likes.summary%28true%29.limit%280%29,comments.summary%28true%29.limit%280%29&limit=1&access_token=&until=1441867212&__paging_token="
    }
  }
}

You can then use some application-side JSON manipulation to

  • Add the Company -> Country -> Page mapping info to the JSON
  • Get rid of unwanted fields such as paging
  • Flatten the structure before saving (e.g. posts.data as posts)

before you save it to Elasticsearch. See the JSFiddle I prepared (fill in the access token!):

Then, you can use the bulk load feature to load the data to Elasticsearch:

Sample JavaScript code:

var pageMapping = {
    "venturebeat": {
        "country": "United States",
        "company": "Venture Beat"
    },
    "techcrunch": {
        "country": "United States",
        "company": "TechCrunch"
    }
};

//For bulk load 
var esInfo = {
    "index": "socialmedia",
    "type": "fbosts"
};

var accessToken = "!!!FILL_IN_HERE_BEFORE_EXECUTING!!!";

var requestUrl = "https://graph.facebook.com/?ids=venturebeat,techcrunch&fields=id,name,posts.fields(id,message,created_time,link,picture,place,status_type,shares,likes.summary(true).limit(0),comments.summary(true).limit(0)).limit(2)&access_token=" + accessToken;

$.getJSON(requestUrl, function(fbResponse) {

    //Array to store the bulk info for ES
    var bulkLoad = [];

    //Iterate over the pages
    Object.getOwnPropertyNames(fbResponse).forEach(function(page, idx, array) {
        var pageData = fbResponse[page];
        var pageId = pageData.id;

        pageData.posts.data.forEach(function(pagePostObj, idx, array) {
            var postObj = {};
            postObj.country = pageMapping[page].country;
            postObj.company = pageMapping[page].company;
            postObj.page_id = pageData.id;
            postObj.page_name = pageData.name;
            postObj.post_id = pagePostObj.id;
            postObj.message = pagePostObj.message;
            postObj.created_time = pagePostObj.created_time;
            postObj.link = pagePostObj.link;
            postObj.picture = pagePostObj.picture;
            postObj.place = pagePostObj.place;
            postObj.status_type = pagePostObj.status_type;
            postObj.shares_count = pagePostObj.shares.count;
            postObj.likes_count = pagePostObj.likes.summary.total_count;
            postObj.comments_count = pagePostObj.comments.summary.total_count;

            //Push bulk load metadata
            bulkLoad.push({ "index" : { "_index": esInfo.index, "_type": esInfo.type } })
            //Push actual object data
            bulkLoad.push(postObj);

        });

    });

    //You can now take the bulkLoad object and POST it to Elasticsearch!
    console.log(JSON.stringify(bulkLoad));

});

Upvotes: 2

Related Questions