Boomshakalaka
Boomshakalaka

Reputation: 521

Extract content from html dat saved in a text file using Beautifulsoup

I struggled to extract some of the contents from a saved HTML file in the local desktop in .txt format.

The file includes many people's profiles and I am showing a sample:

enter image description here

My goal is to extract all green text for src, name, title, and introduction.

Sample html data:

<div _ngcontent-tbp-c85="" class="ng-tns-c85-14 ng-star-inserted"><div _ngcontent-tbp-c85="" class="ng-tns-c85-14"><app-profile-card _ngcontent-tbp-c85="" _nghost-tbp-c82="" class="ng-tns-c82-4511 ng-tns-c85-14 ng-star-inserted"><!----><div _ngcontent-tbp-c82="" data-test="selectProfileCard" class="card overflow-visible word-wrap selectedProfile ng-tns-c82-4511 ng-star-inserted"><!----><!----><div _ngcontent-tbp-c82="" class="card-content padding-half--bottom ng-tns-c82-4511 ng-star-inserted"><div _ngcontent-tbp-c82="" class="content ng-tns-c82-4511"><article _ngcontent-tbp-c82="" class="media ng-tns-c82-4511"><figure _ngcontent-tbp-c82="" class="media-left hidden-xs hidden-sm ng-tns-c82-4511"><p _ngcontent-tbp-c82="" placement="top" class="cursor-pointer ng-tns-c82-4511"><img _ngcontent-tbp-c82="" onerror="this.onerror=null; this.src='https://d304g80if9nu2q.cloudfront.net/grip/static_web_images/image_placeholder.png'" alt="" class="profile-image ng-tns-c82-4511 ng-star-inserted" src="https://d1ew4vee5tqwao.cloudfront.net/things-images/67af0c730dc0c1266c3758fb1213d73f_1630606676.jpeg"><!----><!----></p></figure><div _ngcontent-tbp-c82="" class="media-content is-relative ng-tns-c82-4511"><div _ngcontent-tbp-c82="" class="col-xs-12 padding--none hidden-lg hidden-md center-everything ng-tns-c82-4511"><div _ngcontent-tbp-c82="" class="has-text-centered is-fullwidth ng-tns-c82-4511"><div _ngcontent-tbp-c82="" class="col-xs-12 padding--none center-everything ng-tns-c82-4511"><div _ngcontent-tbp-c82="" placement="top" class="cursor-pointer margin--bottom ng-tns-c82-4511"><img _ngcontent-tbp-c82="" onerror="this.onerror=null; this.src='https://d304g80if9nu2q.cloudfront.net/grip/static_web_images/image_placeholder.png'" alt="" class="profile-image ng-tns-c82-4511 ng-star-inserted" src="https://d1ew4vee5tqwao.cloudfront.net/things-images/67af0c730dc0c1266c3758fb1213d73f_1630606676.jpeg"><!----><!----></div></div><a _ngcontent-tbp-c82="" apptextcolor="" class="link-is-positive ng-tns-c82-4511" style="color: rgb(0, 46, 225);"> Tori Combs </a><!----><!----><!----><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted"><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted">Booth 1100</small><!----><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted">•</small><!----><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted">Sponsor</small><!----><!----><!----></p><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted"><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511">Louisville, KY</small></p><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted"> Marketing Manager at GhostDraft </p><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted"><span _ngcontent-tbp-c82="" placement="top" data-test="inPersonTag" class="tag cursor-pointer agenda-label margin-quarter--right ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="assets/icons/hybrid-in-person.svg" style="margin-right: 8px;" class="ng-tns-c82-4511"> In-person </span><!----><span _ngcontent-tbp-c82="" placement="top" data-test="virtualTag" class="tag cursor-pointer agenda-label margin-quarter--right ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="assets/icons/hybrid-virtual.svg" style="margin-right: 8px;" class="ng-tns-c82-4511"> Virtual </span><!----></p><!----><!----><!----></div></div><div _ngcontent-tbp-c82="" class="hidden-xs hidden-sm margin-half--bottom ng-tns-c82-4511"><!----><div _ngcontent-tbp-c82="" class="headline-padding ng-tns-c82-4511"><a _ngcontent-tbp-c82="" placement="top" class="ng-tns-c82-4511 link-is-positive" data-test="thingId5380205" style="color: rgb(0, 46, 225);"> Tori Combs </a><span _ngcontent-tbp-c82="" class="margin-half--bottom ng-tns-c82-4511 ng-star-inserted"><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511">&nbsp;Booth 1100</small></span><!----><!----><!----><!----><!----><!----><!----></div><div _ngcontent-tbp-c82="" class="ng-tns-c82-4511"><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 margin--none ng-star-inserted"> Marketing Manager at GhostDraft </p><!----><p _ngcontent-tbp-c82="" class="margin-none--bottom ng-tns-c82-4511 ng-star-inserted" style="margin-top: 0.2rem;"><span _ngcontent-tbp-c82="" placement="top" data-test="inPersonTag" class="tag cursor-pointer agenda-label margin-quarter--right ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="assets/icons/hybrid-in-person.svg" alt="" style="margin-right: 8px;" class="ng-tns-c82-4511"> In-person </span><!----><span _ngcontent-tbp-c82="" placement="top" data-test="virtualTag" class="tag cursor-pointer agenda-label margin-quarter--right ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="assets/icons/hybrid-virtual.svg" alt="" style="margin-right: 8px;" class="ng-tns-c82-4511"> Virtual </span><!----></p><!----><!----></div><!----></div></div></article><div _ngcontent-tbp-c82="" class="grey-divider padding-half--top padding-half--bottom ng-tns-c82-4511 cursor-pointer ng-star-inserted"><p _ngcontent-tbp-c82="" class="summary-text margin--none ng-tns-c82-4511">Summary</p><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted">I'm the Marketing Manager at GhostDraft, a leading customer communications and digital experience platform designed to help carriers streamline the li</p><!----><!----></div><!----><div _ngcontent-tbp-c82="" class="grey-divider is-relative is-flex xalign-center margin--bottom ng-tns-c82-4511 cursor-pointer ng-star-inserted"><div _ngcontent-tbp-c82="" class="arrow-button center-everything ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="https://d304g80if9nu2q.cloudfront.net/grip/static_web_images/nav_arrow_down.png" alt="" class="ng-tns-c82-4511"></div><!----></div><!----><app-profile-actions _ngcontent-tbp-c82="" _nghost-tbp-c81="" class="ng-tns-c82-4511 ng-star-inserted"><!----><!----><!----><div _ngcontent-tbp-c81="" class="action-container card-actions is-fullwidth padding-half--bottom"><!----><div _ngcontent-tbp-c81="" class="is-flex is-relative ng-star-inserted"><div _ngcontent-tbp-c81="" class="rec-action is-flex padding--none"><a _ngcontent-tbp-c81="" appbackgroundhover10="" class="text is-1 link interested is-joined-right" data-test="thingYetToSwipeInterested5380205" style="border-color: rgb(0, 46, 225);"><app-handshake _ngcontent-tbp-c81="" class="center-everything"><svg xmlns="http://www.w3.org/2000/svg" height="20" width="24" viewBox="0 0 30 25"><g fill="none" fill-rule="evenodd"><path appSvgStrokeColor="" stroke-linecap="square" d="M9.5 2.5l5 4" style="stroke: rgb(0, 46, 225);"></path><path appSvgPathColor="" d="M23.544 17.31l-5.128 5.126a1.41 1.41 0 0 1-1.004.416c-.38 0-.736-.147-1.004-.416l-.796-.795-.812-.812-7.128-7.127-.813-.812-4.235-4.233L8.66 2.624l2.47 2.47.191.19.258.087 2.102.7.01.003-3.1 3.098a2.236 2.236 0 0 0-.569 2.181c.099.365.283.711.569.998a2.242 2.242 0 0 0 1.59.658c.575 0 1.15-.219 1.59-.658l2.395-2.394a1.425 1.425 0 0 1 2.016 0l3.747 3.745.812.812.804.804c.55.55.55 1.441 0 1.991zm-10.352 5.126a1.408 1.408 0 0 1-1.004.416 1.41 1.41 0 0 1-1.004-.416l-5.128-5.127a1.41 1.41 0 0 1 0-1.991l.803-.804 7.129 7.127-.796.795zm4.83-17.065l.258-.086.19-.192 2.471-2.47 6.035 6.034-4.236 4.233-3.85-3.85a2.43 2.43 0 0 0-3.434 0l-2.499 2.5a1.098 1.098 0 1 1-1.555-1.555l3.404-3.404.209-.209.726-.24.179-.06.369-.123 1.733-.578zm5.53 8.33L28.6 8.658 20.941 1l-3.283 3.281-2.826.942-2.891-.942L8.659 1 1 8.657l5.046 5.045-.802.804c-1 .998-1 2.617 0 3.615l5.128 5.128c.502.5 1.159.751 1.816.751s1.315-.25 1.816-.751l.796-.796.796.796c.5.5 1.158.751 1.816.751.657 0 1.314-.25 1.816-.751l5.128-5.128a2.557 2.557 0 0 0 0-3.615l-.804-.804z" style="fill: rgb(0, 46, 225);"></path></g></svg></app-handshake><span _ngcontent-tbp-c81="" apptextcolor="" class="normal-show" style="color: rgb(0, 46, 225);">Show Interest</span></a><a _ngcontent-tbp-c81="" appbackgroundhover10="" class="swipe-message-button is-joined-left is-relative center-everything ng-star-inserted" style="min-width: 39px; border-left: 0px rgb(0, 46, 225); border-top-color: rgb(0, 46, 225); border-right-color: rgb(0, 46, 225); border-bottom-color: rgb(0, 46, 225);" data-test="showSwipeMenu5380205"><app-swipe-message-arrow _ngcontent-tbp-c81="" class="center-everything arrow-down"><svg id="swipe-intro-arrow" xmlns="http://www.w3.org/2000/svg" height="16" width="16" viewBox="0 0 98 98" x="0px" y="0px" appSvgPathColor="" style="fill: rgb(0, 46, 225);"><path stroke-width="6" d="M34.9,15.8a3,3,0,0,0-4.2.1,2.9,2.9,0,0,0,0,4.2L60.8,49,30.7,77.9a2.9,2.9,0,0,0,0,4.2,2.7,2.7,0,0,0,2.1.9,3.2,3.2,0,0,0,2.1-.8l32.4-31a3.1,3.1,0,0,0,0-4.4Z"></path></svg></app-swipe-message-arrow></a><!----></div><!----></div><!----><!----><!----><!----><div _ngcontent-tbp-c81="" class="rec-action is-inline-block padding--none primary-style alt-hover is-hoverable is-relative ng-star-inserted"><a _ngcontent-tbp-c81="" appbackgroundcolor="" class="text is-1 link normal-show" data-test="thingMeeting5380205" style="background-color: rgb(0, 46, 225);"><app-calendar-white _ngcontent-tbp-c81="" class="center-everything"><svg width="19px" height="21px" viewBox="0 0 19 21" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd"><g id="Lists" transform="translate(-793.000000, -845.000000)" fill="#FFFFFF"><g id="Test" transform="translate(271.000000, 400.000000)"><g id="Meeting" transform="translate(508.000000, 438.000000)"><g id="Group-10" transform="translate(14.000000, 7.000000)"><path d="M12.285,4.13641172 C12.285,4.36088587 12.1182369,4.54758055 11.9098513,4.58629697 L11.83,4.59366391 C11.5787104,4.59366391 11.375,4.39627992 11.375,4.13641172 L11.375,2.85933429 L10.465,2.85933429 L10.464,2.858 L11.374,2.858 L11.375,0.457252188 C11.375,0.232778046 11.5417631,0.0460833659 11.7501487,0.00736694494 L11.83,0 C12.0812896,0 12.285,0.197383996 12.285,0.457252188 L12.284,2.858 L13.194,2.858 L13.195,1.83746556 L16.6613828,1.83746556 C17.0693522,1.83746556 17.4606346,1.99942571 17.7492564,2.28775965 C18.0378472,2.57606262 18.2,2.96725815 18.2,3.37518368 L18.2,18.8152651 C18.2,19.1975106 18.0358425,19.5613489 17.7492564,19.8142917 C17.4587993,20.0706512 17.0847242,20.2121212 16.6973156,20.2121212 L1.44687579,20.2121212 C1.07006292,20.2121212 0.707100039,20.0700656 0.430391385,19.8142917 C0.156014499,19.5606733 -2.00529097e-14,19.2040068 -1.77635684e-14,18.8303692 L-1.77635684e-14,3.36095141 C-1.78125285e-14,2.96116197 0.15415672,2.57676831 0.430391385,2.28775965 C0.705285375,2.00015367 1.08583134,1.83746556 1.48368044,1.83746556 L5.005,1.83746556 L5.004,2.858 L1.80876543,2.85827977 C1.38185185,2.85827977 0.91,3.33680168 0.91,3.76567017 L0.91,7.91131007 L17.087,7.911 L17.0877778,3.76567017 C17.0877778,3.33680168 16.6159259,2.85827977 16.1890123,2.85827977 L13.194,2.858 L13.195,2.85933429 L12.285,2.85933429 L12.285,4.13641172 Z M17.0898006,7.91131007 L17.087,7.911 L17.087,8.83 L0.91,8.83004285 L0.91,18.3070414 L0.91,18.3070414 C0.91,18.9375244 1.40209611,19.191307 1.80876543,19.191307 L16.1890123,19.191307 L16.1890123,19.191307 C16.5371471,19.191307 17.0877778,18.9016241 17.0877778,18.3070414 L17.087,8.83 L17.0898006,8.83004285 L17.0898006,7.91131007 Z M6.37,0 C6.62128956,0 6.825,0.197383996 6.825,0.457252188 L6.824,2.858 L7.734,2.858 L7.735,1.83746556 L10.465,1.83746556 L10.464,2.858 L7.734,2.858 L7.735,2.85933429 L6.825,2.85933429 L6.825,4.13641172 C6.825,4.36088587 6.65823688,4.54758055 6.44985125,4.58629697 L6.37,4.59366391 C6.11871044,4.59366391 5.915,4.39627992 5.915,4.13641172 L5.915,2.85933429 L5.005,2.85933429 L5.004,2.858 L5.914,2.858 L5.915,0.457252188 C5.915,0.232778046 6.08176312,0.0460833659 6.29014875,0.00736694494 L6.37,0 Z" id="Combined-Shape-Copy-3"></path></g></g></g></g></g></svg></app-calendar-white><span _ngcontent-tbp-c81="">Request a meeting</span><img _ngcontent-tbp-c81="" src="assets/icons/chevron-white.svg" alt="" class="is-hidden-mobile margin--none arrow-down"></a><a _ngcontent-tbp-c81="" appbackgroundhover75="" class="text is-1 link hover-show" style="background-color: rgb(0, 46, 225);"><app-calendar-white _ngcontent-tbp-c81="" class="center-everything"><svg width="19px" height="21px" viewBox="0 0 19 21" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd"><g id="Lists" transform="translate(-793.000000, -845.000000)" fill="#FFFFFF"><g id="Test" transform="translate(271.000000, 400.000000)"><g id="Meeting" transform="translate(508.000000, 438.000000)"><g id="Group-10" transform="translate(14.000000, 7.000000)"><path d="M12.285,4.13641172 C12.285,4.36088587 12.1182369,4.54758055 11.9098513,4.58629697 L11.83,4.59366391 C11.5787104,4.59366391 11.375,4.39627992 11.375,4.13641172 L11.375,2.85933429 L10.465,2.85933429 L10.464,2.858 L11.374,2.858 L11.375,0.457252188 C11.375,0.232778046 11.5417631,0.0460833659 11.7501487,0.00736694494 L11.83,0 C12.0812896,0 12.285,0.197383996 12.285,0.457252188 L12.284,2.858 L13.194,2.858 L13.195,1.83746556 L16.6613828,1.83746556 C17.0693522,1.83746556 17.4606346,1.99942571 17.7492564,2.28775965 C18.0378472,2.57606262 18.2,2.96725815 18.2,3.37518368 L18.2,18.8152651 C18.2,19.1975106 18.0358425,19.5613489 17.7492564,19.8142917 C17.4587993,20.0706512 17.0847242,20.2121212 16.6973156,20.2121212 L1.44687579,20.2121212 C1.07006292,20.2121212 0.707100039,20.0700656 0.430391385,19.8142917 C0.156014499,19.5606733 -2.00529097e-14,19.2040068 -1.77635684e-14,18.8303692 L-1.77635684e-14,3.36095141 C-1.78125285e-14,2.96116197 0.15415672,2.57676831 0.430391385,2.28775965 C0.705285375,2.00015367 1.08583134,1.83746556 1.48368044,1.83746556 L5.005,1.83746556 L5.004,2.858 L1.80876543,2.85827977 C1.38185185,2.85827977 0.91,3.33680168 0.91,3.76567017 L0.91,7.91131007 L17.087,7.911 L17.0877778,3.76567017 C17.0877778,3.33680168 16.6159259,2.85827977 16.1890123,2.85827977 L13.194,2.858 L13.195,2.85933429 L12.285,2.85933429 L12.285,4.13641172 Z M17.0898006,7.91131007 L17.087,7.911 L17.087,8.83 L0.91,8.83004285 L0.91,18.3070414 L0.91,18.3070414 C0.91,18.9375244 1.40209611,19.191307 1.80876543,19.191307 L16.1890123,19.191307 L16.1890123,19.191307 C16.5371471,19.191307 17.0877778,18.9016241 17.0877778,18.3070414 L17.087,8.83 L17.0898006,8.83004285 L17.0898006,7.91131007 Z M6.37,0 C6.62128956,0 6.825,0.197383996 6.825,0.457252188 L6.824,2.858 L7.734,2.858 L7.735,1.83746556 L10.465,1.83746556 L10.464,2.858 L7.734,2.858 L7.735,2.85933429 L6.825,2.85933429 L6.825,4.13641172 C6.825,4.36088587 6.65823688,4.54758055 6.44985125,4.58629697 L6.37,4.59366391 C6.11871044,4.59366391 5.915,4.39627992 5.915,4.13641172 L5.915,2.85933429 L5.005,2.85933429 L5.004,2.858 L5.914,2.858 L5.915,0.457252188 C5.915,0.232778046 6.08176312,0.0460833659 6.29014875,0.00736694494 L6.37,0 Z" id="Combined-Shape-Copy-3"></path></g></g></g></g></g></svg></app-calendar-white><span _ngcontent-tbp-c81="">Request a meeting</span><img _ngcontent-tbp-c81="" src="assets/icons/chevron-white.svg" alt="" class="is-hidden-mobile margin--none arrow-down"></a><!----></div><!----><!----><div _ngcontent-tbp-c81="" class="rec-action skip is-inline-block padding--none skip-corner ng-star-inserted" data-test="skipThing5380205"><a _ngcontent-tbp-c81="" class="text is-1 link is-hoverable skip"><svg _ngcontent-tbp-c81="" xmlns="http://www.w3.org/2000/svg" height="20" width="20" viewBox="0 0 22 22" class="normal-show"><path _ngcontent-tbp-c81="" fill="none" fill-rule="evenodd" stroke="#59566B" stroke-width=".92" d="M11 21.46C5.223 21.46.54 16.777.54 11 .54 5.223 5.223.54 11 .54 16.777.54 21.46 5.223 21.46 11c0 5.777-4.683 10.46-10.46 10.46zm-.05-10.035l3.26 3.26a.035.035 0 1 1 .05-.05l-6.47-6.47a.035.035 0 0 1-.05.05l3.21 3.21-3.21 3.21a.035.035 0 0 1 .05.05l6.47-6.47a.035.035 0 1 1-.05-.05l-3.26 3.26z"></path></svg><svg _ngcontent-tbp-c81="" xmlns="http://www.w3.org/2000/svg" height="20" width="20" viewBox="0 0 22 22" class="hover-show"><path _ngcontent-tbp-c81="" fill="none" fill-rule="evenodd" appSvgStrokeColor="" stroke-width=".92" d="M11 21.46C5.223 21.46.54 16.777.54 11 .54 5.223 5.223.54 11 .54 16.777.54 21.46 5.223 21.46 11c0 5.777-4.683 10.46-10.46 10.46zm-.05-10.035l3.26 3.26a.035.035 0 1 1 .05-.05l-6.47-6.47a.035.035 0 0 1-.05.05l3.21 3.21-3.21 3.21a.035.035 0 0 1 .05.05l6.47-6.47a.035.035 0 1 1-.05-.05l-3.26 3.26z" style="stroke: rgb(0, 46, 225);"></path></svg><span _ngcontent-tbp-c81="" class="normal-show">Skip</span><span _ngcontent-tbp-c81="" apptextcolor="" class="hover-show" style="color: rgb(0, 46, 225);">Skip</span></a></div><!----><!----><!----><!----><!----></div></app-profile-actions><!----><!----></div></div><!----><!----></div><!----></app-profile-card><!----></div></div

Here is the code I am able to extract:

photo = soup.find(attrs={"class": "profile-image ng-tns-c82-4511 ng-star-inserted"})
name = soup.find(attrs={"class": "link-is-positive ng-tns-c82-4511"})
title = soup.find(attrs={"class": "ng-tns-c82-4511 margin--none ng-star-inserted"})
intro = soup.find_all(attrs={"class": "summary-text margin--none ng-tns-c82-4511"})

However, the code to extract intro not working. Is there any way to extract the clean text for all required content?

Any thought is appreciated! Thank you!

Upvotes: 1

Views: 128

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

Try:

from bs4 import BeautifulSoup

html_doc = """
<div _ngcontent-tbp-c85="" class="ng-tns-c85-14 ng-star-inserted"><div _ngcontent-tbp-c85="" class="ng-tns-c85-14"><app-profile-card _ngcontent-tbp-c85="" _nghost-tbp-c82="" class="ng-tns-c82-4511 ng-tns-c85-14 ng-star-inserted"><!----><div _ngcontent-tbp-c82="" data-test="selectProfileCard" class="card overflow-visible word-wrap selectedProfile ng-tns-c82-4511 ng-star-inserted"><!----><!----><div _ngcontent-tbp-c82="" class="card-content padding-half--bottom ng-tns-c82-4511 ng-star-inserted"><div _ngcontent-tbp-c82="" class="content ng-tns-c82-4511"><article _ngcontent-tbp-c82="" class="media ng-tns-c82-4511"><figure _ngcontent-tbp-c82="" class="media-left hidden-xs hidden-sm ng-tns-c82-4511"><p _ngcontent-tbp-c82="" placement="top" class="cursor-pointer ng-tns-c82-4511"><img _ngcontent-tbp-c82="" onerror="this.onerror=null; this.src='https://d304g80if9nu2q.cloudfront.net/grip/static_web_images/image_placeholder.png'" alt="" class="profile-image ng-tns-c82-4511 ng-star-inserted" src="https://d1ew4vee5tqwao.cloudfront.net/things-images/67af0c730dc0c1266c3758fb1213d73f_1630606676.jpeg"><!----><!----></p></figure><div _ngcontent-tbp-c82="" class="media-content is-relative ng-tns-c82-4511"><div _ngcontent-tbp-c82="" class="col-xs-12 padding--none hidden-lg hidden-md center-everything ng-tns-c82-4511"><div _ngcontent-tbp-c82="" class="has-text-centered is-fullwidth ng-tns-c82-4511"><div _ngcontent-tbp-c82="" class="col-xs-12 padding--none center-everything ng-tns-c82-4511"><div _ngcontent-tbp-c82="" placement="top" class="cursor-pointer margin--bottom ng-tns-c82-4511"><img _ngcontent-tbp-c82="" onerror="this.onerror=null; this.src='https://d304g80if9nu2q.cloudfront.net/grip/static_web_images/image_placeholder.png'" alt="" class="profile-image ng-tns-c82-4511 ng-star-inserted" src="https://d1ew4vee5tqwao.cloudfront.net/things-images/67af0c730dc0c1266c3758fb1213d73f_1630606676.jpeg"><!----><!----></div></div><a _ngcontent-tbp-c82="" apptextcolor="" class="link-is-positive ng-tns-c82-4511" style="color: rgb(0, 46, 225);"> Tori Combs </a><!----><!----><!----><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted"><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted">Booth 1100</small><!----><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted">•</small><!----><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted">Sponsor</small><!----><!----><!----></p><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted"><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511">Louisville, KY</small></p><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted"> Marketing Manager at GhostDraft </p><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted"><span _ngcontent-tbp-c82="" placement="top" data-test="inPersonTag" class="tag cursor-pointer agenda-label margin-quarter--right ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="assets/icons/hybrid-in-person.svg" style="margin-right: 8px;" class="ng-tns-c82-4511"> In-person </span><!----><span _ngcontent-tbp-c82="" placement="top" data-test="virtualTag" class="tag cursor-pointer agenda-label margin-quarter--right ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="assets/icons/hybrid-virtual.svg" style="margin-right: 8px;" class="ng-tns-c82-4511"> Virtual </span><!----></p><!----><!----><!----></div></div><div _ngcontent-tbp-c82="" class="hidden-xs hidden-sm margin-half--bottom ng-tns-c82-4511"><!----><div _ngcontent-tbp-c82="" class="headline-padding ng-tns-c82-4511"><a _ngcontent-tbp-c82="" placement="top" class="ng-tns-c82-4511 link-is-positive" data-test="thingId5380205" style="color: rgb(0, 46, 225);"> Tori Combs </a><span _ngcontent-tbp-c82="" class="margin-half--bottom ng-tns-c82-4511 ng-star-inserted"><small _ngcontent-tbp-c82="" class="ng-tns-c82-4511">&nbsp;Booth 1100</small></span><!----><!----><!----><!----><!----><!----><!----></div><div _ngcontent-tbp-c82="" class="ng-tns-c82-4511"><!----><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 margin--none ng-star-inserted"> Marketing Manager at GhostDraft </p><!----><p _ngcontent-tbp-c82="" class="margin-none--bottom ng-tns-c82-4511 ng-star-inserted" style="margin-top: 0.2rem;"><span _ngcontent-tbp-c82="" placement="top" data-test="inPersonTag" class="tag cursor-pointer agenda-label margin-quarter--right ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="assets/icons/hybrid-in-person.svg" alt="" style="margin-right: 8px;" class="ng-tns-c82-4511"> In-person </span><!----><span _ngcontent-tbp-c82="" placement="top" data-test="virtualTag" class="tag cursor-pointer agenda-label margin-quarter--right ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="assets/icons/hybrid-virtual.svg" alt="" style="margin-right: 8px;" class="ng-tns-c82-4511"> Virtual </span><!----></p><!----><!----></div><!----></div></div></article><div _ngcontent-tbp-c82="" class="grey-divider padding-half--top padding-half--bottom ng-tns-c82-4511 cursor-pointer ng-star-inserted"><p _ngcontent-tbp-c82="" class="summary-text margin--none ng-tns-c82-4511">Summary</p><p _ngcontent-tbp-c82="" class="ng-tns-c82-4511 ng-star-inserted">I'm the Marketing Manager at GhostDraft, a leading customer communications and digital experience platform designed to help carriers streamline the li</p><!----><!----></div><!----><div _ngcontent-tbp-c82="" class="grey-divider is-relative is-flex xalign-center margin--bottom ng-tns-c82-4511 cursor-pointer ng-star-inserted"><div _ngcontent-tbp-c82="" class="arrow-button center-everything ng-tns-c82-4511 ng-star-inserted"><img _ngcontent-tbp-c82="" src="https://d304g80if9nu2q.cloudfront.net/grip/static_web_images/nav_arrow_down.png" alt="" class="ng-tns-c82-4511"></div><!----></div><!----><app-profile-actions _ngcontent-tbp-c82="" _nghost-tbp-c81="" class="ng-tns-c82-4511 ng-star-inserted"><!----><!----><!----><div _ngcontent-tbp-c81="" class="action-container card-actions is-fullwidth padding-half--bottom"><!----><div _ngcontent-tbp-c81="" class="is-flex is-relative ng-star-inserted"><div _ngcontent-tbp-c81="" class="rec-action is-flex padding--none"><a _ngcontent-tbp-c81="" appbackgroundhover10="" class="text is-1 link interested is-joined-right" data-test="thingYetToSwipeInterested5380205" style="border-color: rgb(0, 46, 225);"><app-handshake _ngcontent-tbp-c81="" class="center-everything"><svg xmlns="http://www.w3.org/2000/svg" height="20" width="24" viewBox="0 0 30 25"><g fill="none" fill-rule="evenodd"><path appSvgStrokeColor="" stroke-linecap="square" d="M9.5 2.5l5 4" style="stroke: rgb(0, 46, 225);"></path><path appSvgPathColor="" d="M23.544 17.31l-5.128 5.126a1.41 1.41 0 0 1-1.004.416c-.38 0-.736-.147-1.004-.416l-.796-.795-.812-.812-7.128-7.127-.813-.812-4.235-4.233L8.66 2.624l2.47 2.47.191.19.258.087 2.102.7.01.003-3.1 3.098a2.236 2.236 0 0 0-.569 2.181c.099.365.283.711.569.998a2.242 2.242 0 0 0 1.59.658c.575 0 1.15-.219 1.59-.658l2.395-2.394a1.425 1.425 0 0 1 2.016 0l3.747 3.745.812.812.804.804c.55.55.55 1.441 0 1.991zm-10.352 5.126a1.408 1.408 0 0 1-1.004.416 1.41 1.41 0 0 1-1.004-.416l-5.128-5.127a1.41 1.41 0 0 1 0-1.991l.803-.804 7.129 7.127-.796.795zm4.83-17.065l.258-.086.19-.192 2.471-2.47 6.035 6.034-4.236 4.233-3.85-3.85a2.43 2.43 0 0 0-3.434 0l-2.499 2.5a1.098 1.098 0 1 1-1.555-1.555l3.404-3.404.209-.209.726-.24.179-.06.369-.123 1.733-.578zm5.53 8.33L28.6 8.658 20.941 1l-3.283 3.281-2.826.942-2.891-.942L8.659 1 1 8.657l5.046 5.045-.802.804c-1 .998-1 2.617 0 3.615l5.128 5.128c.502.5 1.159.751 1.816.751s1.315-.25 1.816-.751l.796-.796.796.796c.5.5 1.158.751 1.816.751.657 0 1.314-.25 1.816-.751l5.128-5.128a2.557 2.557 0 0 0 0-3.615l-.804-.804z" style="fill: rgb(0, 46, 225);"></path></g></svg></app-handshake><span _ngcontent-tbp-c81="" apptextcolor="" class="normal-show" style="color: rgb(0, 46, 225);">Show Interest</span></a><a _ngcontent-tbp-c81="" appbackgroundhover10="" class="swipe-message-button is-joined-left is-relative center-everything ng-star-inserted" style="min-width: 39px; border-left: 0px rgb(0, 46, 225); border-top-color: rgb(0, 46, 225); border-right-color: rgb(0, 46, 225); border-bottom-color: rgb(0, 46, 225);" data-test="showSwipeMenu5380205"><app-swipe-message-arrow _ngcontent-tbp-c81="" class="center-everything arrow-down"><svg id="swipe-intro-arrow" xmlns="http://www.w3.org/2000/svg" height="16" width="16" viewBox="0 0 98 98" x="0px" y="0px" appSvgPathColor="" style="fill: rgb(0, 46, 225);"><path stroke-width="6" d="M34.9,15.8a3,3,0,0,0-4.2.1,2.9,2.9,0,0,0,0,4.2L60.8,49,30.7,77.9a2.9,2.9,0,0,0,0,4.2,2.7,2.7,0,0,0,2.1.9,3.2,3.2,0,0,0,2.1-.8l32.4-31a3.1,3.1,0,0,0,0-4.4Z"></path></svg></app-swipe-message-arrow></a><!----></div><!----></div><!----><!----><!----><!----><div _ngcontent-tbp-c81="" class="rec-action is-inline-block padding--none primary-style alt-hover is-hoverable is-relative ng-star-inserted"><a _ngcontent-tbp-c81="" appbackgroundcolor="" class="text is-1 link normal-show" data-test="thingMeeting5380205" style="background-color: rgb(0, 46, 225);"><app-calendar-white _ngcontent-tbp-c81="" class="center-everything"><svg width="19px" height="21px" viewBox="0 0 19 21" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd"><g id="Lists" transform="translate(-793.000000, -845.000000)" fill="#FFFFFF"><g id="Test" transform="translate(271.000000, 400.000000)"><g id="Meeting" transform="translate(508.000000, 438.000000)"><g id="Group-10" transform="translate(14.000000, 7.000000)"><path d="M12.285,4.13641172 C12.285,4.36088587 12.1182369,4.54758055 11.9098513,4.58629697 L11.83,4.59366391 C11.5787104,4.59366391 11.375,4.39627992 11.375,4.13641172 L11.375,2.85933429 L10.465,2.85933429 L10.464,2.858 L11.374,2.858 L11.375,0.457252188 C11.375,0.232778046 11.5417631,0.0460833659 11.7501487,0.00736694494 L11.83,0 C12.0812896,0 12.285,0.197383996 12.285,0.457252188 L12.284,2.858 L13.194,2.858 L13.195,1.83746556 L16.6613828,1.83746556 C17.0693522,1.83746556 17.4606346,1.99942571 17.7492564,2.28775965 C18.0378472,2.57606262 18.2,2.96725815 18.2,3.37518368 L18.2,18.8152651 C18.2,19.1975106 18.0358425,19.5613489 17.7492564,19.8142917 C17.4587993,20.0706512 17.0847242,20.2121212 16.6973156,20.2121212 L1.44687579,20.2121212 C1.07006292,20.2121212 0.707100039,20.0700656 0.430391385,19.8142917 C0.156014499,19.5606733 -2.00529097e-14,19.2040068 -1.77635684e-14,18.8303692 L-1.77635684e-14,3.36095141 C-1.78125285e-14,2.96116197 0.15415672,2.57676831 0.430391385,2.28775965 C0.705285375,2.00015367 1.08583134,1.83746556 1.48368044,1.83746556 L5.005,1.83746556 L5.004,2.858 L1.80876543,2.85827977 C1.38185185,2.85827977 0.91,3.33680168 0.91,3.76567017 L0.91,7.91131007 L17.087,7.911 L17.0877778,3.76567017 C17.0877778,3.33680168 16.6159259,2.85827977 16.1890123,2.85827977 L13.194,2.858 L13.195,2.85933429 L12.285,2.85933429 L12.285,4.13641172 Z M17.0898006,7.91131007 L17.087,7.911 L17.087,8.83 L0.91,8.83004285 L0.91,18.3070414 L0.91,18.3070414 C0.91,18.9375244 1.40209611,19.191307 1.80876543,19.191307 L16.1890123,19.191307 L16.1890123,19.191307 C16.5371471,19.191307 17.0877778,18.9016241 17.0877778,18.3070414 L17.087,8.83 L17.0898006,8.83004285 L17.0898006,7.91131007 Z M6.37,0 C6.62128956,0 6.825,0.197383996 6.825,0.457252188 L6.824,2.858 L7.734,2.858 L7.735,1.83746556 L10.465,1.83746556 L10.464,2.858 L7.734,2.858 L7.735,2.85933429 L6.825,2.85933429 L6.825,4.13641172 C6.825,4.36088587 6.65823688,4.54758055 6.44985125,4.58629697 L6.37,4.59366391 C6.11871044,4.59366391 5.915,4.39627992 5.915,4.13641172 L5.915,2.85933429 L5.005,2.85933429 L5.004,2.858 L5.914,2.858 L5.915,0.457252188 C5.915,0.232778046 6.08176312,0.0460833659 6.29014875,0.00736694494 L6.37,0 Z" id="Combined-Shape-Copy-3"></path></g></g></g></g></g></svg></app-calendar-white><span _ngcontent-tbp-c81="">Request a meeting</span><img _ngcontent-tbp-c81="" src="assets/icons/chevron-white.svg" alt="" class="is-hidden-mobile margin--none arrow-down"></a><a _ngcontent-tbp-c81="" appbackgroundhover75="" class="text is-1 link hover-show" style="background-color: rgb(0, 46, 225);"><app-calendar-white _ngcontent-tbp-c81="" class="center-everything"><svg width="19px" height="21px" viewBox="0 0 19 21" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd"><g id="Lists" transform="translate(-793.000000, -845.000000)" fill="#FFFFFF"><g id="Test" transform="translate(271.000000, 400.000000)"><g id="Meeting" transform="translate(508.000000, 438.000000)"><g id="Group-10" transform="translate(14.000000, 7.000000)"><path d="M12.285,4.13641172 C12.285,4.36088587 12.1182369,4.54758055 11.9098513,4.58629697 L11.83,4.59366391 C11.5787104,4.59366391 11.375,4.39627992 11.375,4.13641172 L11.375,2.85933429 L10.465,2.85933429 L10.464,2.858 L11.374,2.858 L11.375,0.457252188 C11.375,0.232778046 11.5417631,0.0460833659 11.7501487,0.00736694494 L11.83,0 C12.0812896,0 12.285,0.197383996 12.285,0.457252188 L12.284,2.858 L13.194,2.858 L13.195,1.83746556 L16.6613828,1.83746556 C17.0693522,1.83746556 17.4606346,1.99942571 17.7492564,2.28775965 C18.0378472,2.57606262 18.2,2.96725815 18.2,3.37518368 L18.2,18.8152651 C18.2,19.1975106 18.0358425,19.5613489 17.7492564,19.8142917 C17.4587993,20.0706512 17.0847242,20.2121212 16.6973156,20.2121212 L1.44687579,20.2121212 C1.07006292,20.2121212 0.707100039,20.0700656 0.430391385,19.8142917 C0.156014499,19.5606733 -2.00529097e-14,19.2040068 -1.77635684e-14,18.8303692 L-1.77635684e-14,3.36095141 C-1.78125285e-14,2.96116197 0.15415672,2.57676831 0.430391385,2.28775965 C0.705285375,2.00015367 1.08583134,1.83746556 1.48368044,1.83746556 L5.005,1.83746556 L5.004,2.858 L1.80876543,2.85827977 C1.38185185,2.85827977 0.91,3.33680168 0.91,3.76567017 L0.91,7.91131007 L17.087,7.911 L17.0877778,3.76567017 C17.0877778,3.33680168 16.6159259,2.85827977 16.1890123,2.85827977 L13.194,2.858 L13.195,2.85933429 L12.285,2.85933429 L12.285,4.13641172 Z M17.0898006,7.91131007 L17.087,7.911 L17.087,8.83 L0.91,8.83004285 L0.91,18.3070414 L0.91,18.3070414 C0.91,18.9375244 1.40209611,19.191307 1.80876543,19.191307 L16.1890123,19.191307 L16.1890123,19.191307 C16.5371471,19.191307 17.0877778,18.9016241 17.0877778,18.3070414 L17.087,8.83 L17.0898006,8.83004285 L17.0898006,7.91131007 Z M6.37,0 C6.62128956,0 6.825,0.197383996 6.825,0.457252188 L6.824,2.858 L7.734,2.858 L7.735,1.83746556 L10.465,1.83746556 L10.464,2.858 L7.734,2.858 L7.735,2.85933429 L6.825,2.85933429 L6.825,4.13641172 C6.825,4.36088587 6.65823688,4.54758055 6.44985125,4.58629697 L6.37,4.59366391 C6.11871044,4.59366391 5.915,4.39627992 5.915,4.13641172 L5.915,2.85933429 L5.005,2.85933429 L5.004,2.858 L5.914,2.858 L5.915,0.457252188 C5.915,0.232778046 6.08176312,0.0460833659 6.29014875,0.00736694494 L6.37,0 Z" id="Combined-Shape-Copy-3"></path></g></g></g></g></g></svg></app-calendar-white><span _ngcontent-tbp-c81="">Request a meeting</span><img _ngcontent-tbp-c81="" src="assets/icons/chevron-white.svg" alt="" class="is-hidden-mobile margin--none arrow-down"></a><!----></div><!----><!----><div _ngcontent-tbp-c81="" class="rec-action skip is-inline-block padding--none skip-corner ng-star-inserted" data-test="skipThing5380205"><a _ngcontent-tbp-c81="" class="text is-1 link is-hoverable skip"><svg _ngcontent-tbp-c81="" xmlns="http://www.w3.org/2000/svg" height="20" width="20" viewBox="0 0 22 22" class="normal-show"><path _ngcontent-tbp-c81="" fill="none" fill-rule="evenodd" stroke="#59566B" stroke-width=".92" d="M11 21.46C5.223 21.46.54 16.777.54 11 .54 5.223 5.223.54 11 .54 16.777.54 21.46 5.223 21.46 11c0 5.777-4.683 10.46-10.46 10.46zm-.05-10.035l3.26 3.26a.035.035 0 1 1 .05-.05l-6.47-6.47a.035.035 0 0 1-.05.05l3.21 3.21-3.21 3.21a.035.035 0 0 1 .05.05l6.47-6.47a.035.035 0 1 1-.05-.05l-3.26 3.26z"></path></svg><svg _ngcontent-tbp-c81="" xmlns="http://www.w3.org/2000/svg" height="20" width="20" viewBox="0 0 22 22" class="hover-show"><path _ngcontent-tbp-c81="" fill="none" fill-rule="evenodd" appSvgStrokeColor="" stroke-width=".92" d="M11 21.46C5.223 21.46.54 16.777.54 11 .54 5.223 5.223.54 11 .54 16.777.54 21.46 5.223 21.46 11c0 5.777-4.683 10.46-10.46 10.46zm-.05-10.035l3.26 3.26a.035.035 0 1 1 .05-.05l-6.47-6.47a.035.035 0 0 1-.05.05l3.21 3.21-3.21 3.21a.035.035 0 0 1 .05.05l6.47-6.47a.035.035 0 1 1-.05-.05l-3.26 3.26z" style="stroke: rgb(0, 46, 225);"></path></svg><span _ngcontent-tbp-c81="" class="normal-show">Skip</span><span _ngcontent-tbp-c81="" apptextcolor="" class="hover-show" style="color: rgb(0, 46, 225);">Skip</span></a></div><!----><!----><!----><!----><!----></div></app-profile-actions><!----><!----></div></div><!----><!----></div><!----></app-profile-card><!----></div></div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

photo = soup.select_one(".profile-image")["src"]
name = soup.select_one("app-profile-card a")
# I'm assuming every title is in form "<some position> AT <some company>":
title = soup.select_one('p:-soup-contains(" at ")')
intro = soup.select_one('p:-soup-contains("Summary") + p')

print(photo)
print(name.get_text(strip=True))
print(title.get_text(strip=True))
print(intro.get_text(strip=True))

Prints:

https://d1ew4vee5tqwao.cloudfront.net/things-images/67af0c730dc0c1266c3758fb1213d73f_1630606676.jpeg
Tori Combs
Marketing Manager at GhostDraft
I'm the Marketing Manager at GhostDraft, a leading customer communications and digital experience platform designed to help carriers streamline the li

EDIT: Version without CSS selectors:

photo = soup.select_one(".profile-image")["src"]
name = soup.select_one("app-profile-card a")
# I'm assuming every title is in form "<some position> AT <some company>":
title = soup.find(lambda tag: tag.name == "p" and " at " in tag.text)
intro = soup.find(
    lambda tag: tag.name == "p" and "Summary" in tag.text
).find_next_sibling("p")

Upvotes: 1

Related Questions