Tatiana
Tatiana

Reputation: 746

Is there ability to get articles from VK API?

I try to get Articles from VK group. But I can't find any possibility to get them from VK API. Maybe someone faced the same problem? Any chance to get Articles by using get for Posts? (I'm using vk_api python package)

Upvotes: 0

Views: 1333

Answers (1)

Manuel Cortez
Manuel Cortez

Reputation: 11

Disclaimer: i am basically unable to fully understand docs in Russian on VK API documentation.

Getting a single article

It seems there is no documented way to retrieve articles in VK API, though if you are already using Python and vk_api Then you can use the session instantiated in the main class. This won't give you an article but the HTML itself, so you would have to parse it to extract text if you need. Something like this is what i do use in my code:

import vk_api
    vk_session = vk_api.VkApi(login, password)
    try:
        vk_session.auth(token_only=True)
    except vk_api.AuthError as error_msg:
        print(error_msg)
        return
# Note that calls are going to be performed with the vk_session object, not the API class.
article_url = "https://vk.com/@riakatyusha-akademik-fortov-buduschee-budet-takim-kakim-my-ego-opredelim"
article_content = vk_session.http.get(article_url).text

This should help you to get started. From here you just need to process the HTML code. Unfortunately, there are no documentation regarding articles in the VK methods page so probably is there not much else we can do for dealing with articles.

Extracting article URLS from a group or user page

Here is some code that should get you started with extracting all articles from an user or community page. The only dependency here is bs4. I used the lxml parser cause is the fastest and I have it on my machine, but if you don't want/have it you might use others, as suggested in the BeautifulSoup's docs

This really simple method should help you to retrieve the most recent 20 articles posted in a group. I was unable to find a method to load more items, though it looks you would need to play with author_page.php. It looks to be difficult, though. Probably you might find some inspiration in the audio class of VK_api or ask in their github.

Assuming you don't want to access private groups, here's the code (I thought calling the post and get methods by using the VK_api requests session would be enough to be logged in vk, but it seems you require additional steps):

import requests
from bs4 import BeautifulSoup
group_url = "https://m.vk.com/@riakatyusha"
body = requests.get(group_url)
soup = BeautifulSoup(body.text, "lxml")
articles_list = soup.find_all("div", class_="author-page-article")
for article in articles_list:
    # VK includes relative URLS in articles so you'd need to complete it first.
    url = article.a["href"]
    url = "https://m.vk.com"+url
    # Optionally, we could remove the GET params you have in urls such as context&ref.
    url = url.split("?")[0]
    # We still might retrieve some extra info in case you'd need.
    title = article.find("span", class_="author-page-article__title").text
    summary = article.p.text
    print(title, summary, url)

Upvotes: 1

Related Questions