Diganta Misra
Diganta Misra

Reputation: 73

How to scrape the number of followers of questions from Quora using Ruby?

I have been trying to implement a project to scrape questions from Quora based on a topic and have been using this resource as a foundation - https://github.com/Theminijohn/quora-scraper As shown in this page, the followers are being extracted as expected for each question. However upon implementing the same in my system, for each question the follower count is shown zero even if it is not zero. Column Follower always has zero value as shown here

The code which is responsible for extracting the number of followers is this:

follower_count = q.css('.FollowActionItem .icon_action_bar-label span > span:last-child').text.to_i

Everything else is working as expected. What am I missing here?

Edit: The whole Code snippet is as follows:

    require 'rubygems'
require 'ruby-progressbar'
require 'Nokogiri'
require 'csv'
require 'pry'

ENGAGEMENT_THRESHOLD = 5

# init progressbar
progressbar = ProgressBar.create( format:         '%a %bᗧ%i %p%% %t',
                                  progress_mark:  ' ',
                                  remainder_mark: '・')

# parse file
doc = File.open("input.html") { |x| Nokogiri::HTML(x) }
questions = doc.css('.TopicAllQuestionsList .pagedlist_item')

# identifiers
canonical_link = doc.at('link[rel="canonical"]')['href']
topic_name = canonical_link.match(/quora.com\/topic\/(.*)/)[1]

# update progressbar
progressbar.total = questions.count

# prepare csv
unless File.exist?('quora-data.csv')
  CSV.open("quora-data.csv", "w+") do |csv|
    csv << [
      "Topic", "Title", "Followers", "Answers", "Ratio", "Engagement potential",
      "Last action", "Parsed time", "Question link"
    ]
  end
end

questions.each do |q|
  link = "https://www.quora.com" + q.css('a.question_link').attr('href').value
  title = q.css('a.question_link').text.strip
  answer_count = q.css('.QuestionFooter .answer_count_prominent').text.strip.to_i
  follower_count = q.css('.FollowActionItem .icon_action_bar-label span > span:last-child').text.to_i
  ratio = "#{follower_count}/#{answer_count}"

  if answer_count == 0
    take_action = (follower_count >= ENGAGEMENT_THRESHOLD) ? "Yes" : "No"
  else
    take_action = ((follower_count / answer_count) >= ENGAGEMENT_THRESHOLD) ? "Yes" : "No"
  end

  # timestamps
  raw_time = q.css('.QuestionFooter .question_timestamp').text.strip
  last_action = raw_time.include?("Last requested") ? "Requested" : "Followed"

  if raw_time.include?('ago')
    if raw_time.scan(/(\d*)h/).flatten.any?
      hours_ago = raw_time.scan(/(\d*)h/).flatten[0].to_f
      parsed_time = (DateTime.now - (hours_ago / 24)).strftime('%Y-%m-%d')
    elsif raw_time.scan(/(\d*)m/).flatten.any?
      minutes_ago = raw_time.scan(/(\d*)m/).flatten[0].to_f
      parsed_time = (DateTime.now - (1.0 / 24 / 60)).strftime('%Y-%m-%d')
    end
  else
    if raw_time.count("0-9") > 0
      parsed_time = Date.parse(raw_time).strftime("%Y-%m-%d")
    else
      parsed_time =
        (Date.today < Date.parse(raw_time)) ? (Date.parse(raw_time) - 7) : Date.parse(raw_time)
    end
  end

  CSV.open("quora-data.csv", "a+") do |csv|
    csv << [
      topic_name, title, follower_count, answer_count, ratio,
      take_action, last_action, parsed_time, link
    ]
  end

  # move progressbar
  progressbar.increment
end

<!DOCTYPE html>
<!-- saved from url=(0099)file:///C:/Users/DIGANTA/quora/quora-scraper/All%20Questions%20on%20Data%20Science%20-%20Quora.html -->
<html lang="en" class="js-wf-loaded"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="icon" href="https://qsf.fs.quoracdn.net/-3-images.favicon.ico-26-ae77b637b1e7ed2c.ico"><link rel="preload" as="font" type="font/woff2" crossorigin="anonymous" href="https://qsf.fs.quoracdn.net/-3-fonts.q-icons.q-icons.woff2-26-9afc20a49e3ef2cf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin="anonymous" href="https://qsf.fs.quoracdn.net/-3-fonts.q_serif.q_serif_regular.woff2-26-7ace3bc4cbe404d9.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin="anonymous" href="https://qsf.fs.quoracdn.net/-3-fonts.q_serif.q_serif_regular_italic.woff2-26-9d81ab3229809d01.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin="anonymous" href="https://qsf.fs.quoracdn.net/-3-fonts.q_serif.q_serif_semibold.woff2-26-b55bf39d9018ace9.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin="anonymous" href="https://qsf.fs.quoracdn.net/-3-fonts.q_serif.q_serif_semibold_italic.woff2-26-4c39f22524232bf2.woff2"><script src="./input_files/sdk.js.download" async="" crossorigin="anonymous"></script><script src="file:///C:/Users/DIGANTA/quora/quora-scraper/All%20Questions%20on%20Data%20Science%20-%20Quora_files/sdk.js.download" async="" crossorigin="anonymous"></script><script async="" src="file:///C:/Users/DIGANTA/quora/quora-scraper/All%20Questions%20on%20Data%20Science%20-%20Quora_files/analytics.js.download"></script><script type="text/javascript" async="" src="file:///C:/Users/DIGANTA/quora/quora-scraper/All%20Questions%20on%20Data%20Science%20-%20Quora_files/widgets.js.download"></script><script type="text/javascript" async="" src="file:///C:/Users/DIGANTA/quora/quora-scraper/All%20Questions%20on%20Data%20Science%20-%20Quora_files/sdk.js(1).download"></script><script type="text/javascript">window.Q = {"fontFamilies": ["q-icons", "q_serif"], "errorSamplingRate": 1.0, "revision": "41e9b4435b78728ddf351e72a6dc45ca9708ebc2", "subdomainSuffix": "quora.com"};window["webpackManifest"] = {"ads_manager": "https://qsc.fs.quoracdn.net/-3-chunk.web.ads_manager.js.out-34-1e09a2ca57288a3c.webpack", "content_widgets": "https://qsc.fs.quoracdn.net/-3-chunk.web.content_widgets.js.out-34-9a6c124eee999cb7.webpack", "dev": "https://qsc.fs.quoracdn.net/-3-chunk.web.dev.js.out-34-5d22ece0a38f03a1.webpack", "internal": "https://qsc.fs.quoracdn.net/-3-chunk.web.internal.js.out-34-2e41b1b9af1f0f88.webpack", "qtext2": "https://qsc.fs.quoracdn.net/-3-chunk.web.qtext2.js.out-34-b3d77df0693a06da.webpack", "main": "https://qsc.fs.quoracdn.net/-3-chunk.web.main.js.out-34-835b38fb05330b9f.webpack", "firebase": "https://qsc.fs.quoracdn.net/-3-chunk.web.firebase.js.out-34-eadc5f3144befc37.webpack", "publisher_dashboard": "https://qsc.fs.quoracdn.net/-3-chunk.web.publisher_dashboard.js.out-34-0c43bcc87e209b23.webpack"};window["webpackChunks"] = ["main"];window["PAGE_IS_MOBILE"] = false;var assetErrs=[];document.addEventListener("DOMContentLoaded",function(e){if(0!==assetErrs.length){var s="assets="+encodeURIComponent(JSON.stringify(assetErrs)),t=new XMLHttpRequest;t.open("POST","/ajax/log_browser_asset_load_error_3RD_PARTY_POST",!0),t.setRequestHeader("Content-Type","application/x-www-form-urlencoded; charset=UTF-8"),t.setRequestHeader("Accept","*/*"),t.send(s.replace(/%20/g,"+"))}}),window.addAssetErr=function(e){e&&assetErrs.push(e)};

Complete HTML file can be found here- https://drive.google.com/file/d/1_X86tq5TTw4ikk-hQ2Ixd13Y_hR4scBg/view?usp=sharing

The HTML containing info of the number of followers is:

<div class="FollowActionItem ItemComponent primary_item u-relative"><span id="wVP1Ux4a11"><a class="ui_button ui_button--styled ui_button--FlatStyle ui_button--FlatStyle--gray ui_button--size_regular u-inline-block ui_button--non_link ui_button--supports_icon ui_button--has_icon" href="#" role="button" action_click="QuestionFollow" action_target="{&quot;qid&quot;: 44394942, &quot;type&quot;: &quot;question&quot;}" id="__w2_wVP1Ux4a27_button"><div class="ui_button_inner" id="__w2_wVP1Ux4a27_inner"><div class="ui_button_icon_wrapper u-relative u-flex-inline"><div id="__w2_wVP1Ux4a27_icon"><span class="ui_button_icon" aria-hidden="true"><svg width="24px" height="24px" viewBox="0 0 24 24" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
    <g stroke="none" fill="none" fill-rule="evenodd" stroke-linecap="round">
        <g id="follow" class="icon_svg-stroke" stroke="#666" stroke-width="1.5">
            <path d="M14.5,19 C14.5,13.3369229 11.1630771,10 5.5,10 M19.5,19 C19.5,10.1907689 14.3092311,5 5.5,5" id="lines"></path>
            <circle id="circle" cx="7.5" cy="17" r="2" class="icon_svg-fill" fill="none"></circle>
        </g>
    </g>
</svg></span></div></div><div class="ui_button_label_count_wrapper"><span class="ui_button_label" id="__w2_wVP1Ux4a27_label">Follow</span><span class="ui_button_count" aria-hidden="true" id="__w2_wVP1Ux4a27_count_wrapper"><span class="bullet"> · </span><span class="ui_button_count_inner" id="__w2_wVP1Ux4a27_count">1</span></span></div></div></a></span></div>

Upvotes: 2

Views: 308

Answers (0)

Related Questions