rg4s
rg4s

Reputation: 897

How can I find elements with variative class name in Python?

I am parsing article's and megapost's metrics (likes, views, comments, dates) from the forum.

I am using Selenium and I'm trying to reach the datetime published.

dates = []

page_items = len(drv.find_elements_by_class_name("tm-articles-list"))
    for i in range(page_items):
        date_of_post = drv.find_elements_by_class_name("tm-article-snippet__datetime-published")
            for d in date_of_post:
               date_text = d.find_element_by_tag_name("time").text
               dates.append(date_text)

The problem is that there is a difference between the basic articles and megaposts in a HTTML class names. Datetime for articles class name is tm-article-snippet__datetime-published and for megaposts it's tm-megapost-snippet__datetime-published. I am wondering what is the possible way to parse the datetime regardless the type of class.

I tried to do it through the logical expression: date_of_post = drv.find_elements_by_class_name("tm-article-snippet__datetime-published" or "tm-megapost-snippet__datetime-published") but obviously it does not work.

Important remark: all megaposts on the forum are situated in the tm-articles-list class.

HTML for megaposts:

    <article id="424221" data-navigatable="" tabindex="0" class="tm-articles-list__item">
<div class="tm-megapost-snippet">
<div class="tm-megapost-snippet__wrapper" style="background: url(&quot;https://habrastorage.org/r/w780/getpro/tmtm/megapost/928/9f7/ad0/9289f7ad0d8e76bf87471d2dbf71401a.jpg&quot;) center center / cover no-repeat;">
<div class="tm-megapost-snippet__tint">
<header class="tm-megapost-snippet__header">
<a href="/ru/company/dins/" class="tm-megapost-snippet__link tm-megapost-snippet__company-blog router-link-active">
<span>Блог компании DINS</span>
</a>
<a href="/ru/article/424221/" class="tm-megapost-snippet__link tm-megapost-snippet__date">
<time datetime="2018-11-09T14:58:14.000Z" title="2018-11-09, 17:58" class="tm-megapost-snippet__datetime-published">9  ноября  2018</time>
</a>
</header>
<a href="/ru/article/424221/" class="tm-megapost-snippet__link tm-megapost-snippet__card">
<h2 class="tm-megapost-snippet__title">Жизнь С++</h2>
</a>
<ul class="tm-megapost-snippet__hubs">
<li class="tm-megapost-snippet__hub"><a href="/ru/hub/programming/" class="tm-megapost-snippet__link"><span>Программирование</span></a></li><li class="tm-megapost-snippet__hub"><a href="/ru/hub/read/" class="tm-megapost-snippet__link"><span>Читальный зал</span></a></li><li class="tm-megapost-snippet__hub"><a href="/ru/hub/history/" class="tm-megapost-snippet__link"><span>История IT</span></a></li><li class="tm-megapost-snippet__hub"><a href="/ru/hub/itcompanies/" class="tm-megapost-snippet__link"><span>IT-компании</span></a></li></ul></div></div><div class="tm-megapost-snippet__body"><div class="article-formatted-body article-formatted-body_version-1">IT-эволюция - шутка парадоксальная. Например, сначала на компьютерах моделировали нагрузку на АТС, затем программно управляли вызовами, а теперь телефония - это облачное решение, которое разворачивается за несколько минут и объединяет все корпоративные коммуникации. 
    
    Кажется, между этими изменениями мало общего. На самом деле они стали возможными благодаря принципам программирования, заложенным полвека назад. И чтобы лучше увидеть эту связь, мы решили вспомнить историю С++ - одного из самых “взрослых” языков программирования. Он может быть и удобным инструментом разработки, и ночным кошмаром, и частью корпоративной истории. std::begin( )
    </div><a href="/ru/article/424221/" class="tm-megapost-snippet__readmore"><span>Подробности — под катом</span></a></div></div><div class="tm-data-icons"><!----><div class="tm-votes-meter tm-data-icons__item"><svg height="16" width="16" class="tm-svg-img tm-votes-meter__icon tm-votes-meter__icon_small"><title>Всего голосов 72: ↑65 и ↓7</title><use xlink:href="/img/megazord-v24.cee85629.svg#counter-rating"></use></svg><span title="Всего голосов 72: ↑65 и ↓7" class="tm-votes-meter__value tm-votes-meter__value_positive tm-votes-meter__value_small">+58</span></div><span class="tm-icon-counter tm-data-icons__item" title="Количество просмотров"><svg height="16" width="16" class="tm-svg-img tm-icon-counter__icon"><title>Просмотры</title><use xlink:href="/img/megazord-v24.cee85629.svg#counter-views"></use></svg><span class="tm-icon-counter__value">43K</span></span><button title="Добавить в закладки" type="button" class="bookmarks-button tm-data-icons__item"><span title="Добавить в закладки" class="tm-svg-icon__wrapper bookmarks-button__icon"><svg height="16" width="16" class="tm-svg-img tm-svg-icon"><title>Добавить в закладки</title><use xlink:href="/img/megazord-v24.cee85629.svg#counter-favorite"></use></svg></span><span title="Количество пользователей, добавивших публикацию в закладки" class="bookmarks-button__counter">
        119
      </span></button><div class="tm-article-comments-counter-link tm-data-icons__item" title="Читать комментарии"><a href="/ru/company/dins/blog/424221/comments/" class="tm-article-comments-counter-link__link"><svg height="16" width="16" class="tm-svg-img tm-article-comments-counter-link__icon"><title>Комментарии</title><use xlink:href="/img/megazord-v24.cee85629.svg#counter-comments"></use></svg><span class="tm-article-comments-counter-link__value">
          189
        </span></a><a href="/ru/company/dins/blog/424221/comments/" class="tm-article-comments-counter-link__link"><span title="Читать новые комментарии" class="tm-article-comments-counter-link__unread-counter">
          +189
        </span></a></div><!----><div class="v-portal" style="display: none;"></div></div></article>

HTML for regular articles

<article id="433166" data-navigatable="" tabindex="0" class="tm-articles-list__item">
<div class="tm-article-snippet">
<div class="tm-article-snippet__meta-container">
<div class="tm-article-snippet__meta">
<span class="tm-user-info tm-article-snippet__author"><a href="/ru/users/640509-040147/" class="tm-user-info__userpic" title="640509-040147">
<div class="tm-entity-image">
<svg height="24" width="24" class="tm-svg-img tm-image-placeholder tm-image-placeholder_pink"><!----><use xlink:href="/img/megazord-v24.cee85629.svg#placeholder-user"></use></svg></div></a><span class="tm-user-info__user"><a href="/ru/users/640509-040147/" class="tm-user-info__username">
      640509-040147
    </a>
</span></span>
<span class="tm-article-snippet__datetime-published">
<time datetime="2018-12-25T11:36:26.000Z" title="2018-12-25, 14:36">25  декабря  2018 в 14:36</time></span></div><!---->
</div>
<h2 class="tm-article-snippet__title tm-article-snippet__title_h2"><a href="/ru/company/dins/blog/433166/" class="tm-article-snippet__title-link" data-article-link=""><span>Предсказываем время решения тикета с помощью машинного обучения</span></a></h2><div class="tm-article-snippet__hubs"><span class="tm-article-snippet__hubs-item"><a href="/ru/company/dins/blog/" class="tm-article-snippet__hubs-item-link router-link-active"><span>Блог компании DINS</span><!----></a></span><span class="tm-article-snippet__hubs-item"><a href="/ru/hub/python/" class="tm-article-snippet__hubs-item-link"><span>Python</span><span title="Профильный хаб" class="tm-article-snippet__profiled-hub">*</span></a></span><span class="tm-article-snippet__hubs-item"><a href="/ru/hub/data_mining/" class="tm-article-snippet__hubs-item-link"><span>Data Mining</span><span title="Профильный хаб" class="tm-article-snippet__profiled-hub">*</span></a></span><span class="tm-article-snippet__hubs-item"><a href="/ru/hub/machine_learning/" class="tm-article-snippet__hubs-item-link"><span>Машинное обучение</span><span title="Профильный хаб" class="tm-article-snippet__profiled-hub">*</span></a></span></div><div class="tm-article-snippet__labels"><!----></div><!----><div class="tm-article-body tm-article-snippet__lead">

Sorry for may be a very simple, silly, or even duplicated question (however, have not find anything related to the topic). I feel more confidently in R rather than in Python but when things are going that I need to parse something, I go for Python :)

Upvotes: 0

Views: 88

Answers (1)

cruisepandey
cruisepandey

Reputation: 29362

to parse the datetime regardless the type of class, you may consider to use xpath.

there is a xpath or operator.

//*[contains(@class, 'tm-article-snippet__datetime-published') or contains(@class, 'tm-megapost-snippet__datetime-published')] 

here * means any node. But I believe they are part of span tag, right ?

if so, you can replace * with span.

try below :

date_of_post = drv.find_elements_by_xpath("//*[contains(@class, 'tm-article-snippet__datetime-published') or contains(@class, 'tm-megapost-snippet__datetime-published')]")

Update 1 :

date_time = []
lst = driver.find_elements(By.XPATH, "//*[contains(@class, 'tm-article-snippet__datetime-published') or contains(@class, 'tm-megapost-snippet__datetime-published')]")
try:
    for item in lst:
        try:
            if len(item.get_attribute('datetime')) > 0:
                print("meaning inner attribute date time is present")
                date_time.append(item.get_attribute('datetime'))
            else:
                print("Now we should look for child node")
                item.find_element(By.XPATH, ".//child::time").get_attribute('datetime')
                date_time.append(item.get_attribute('datetime'))
        except:
            print("First exception")
            break
except:
    print("Final exception")

Upvotes: 2

Related Questions