Reputation: 2870
My example html is
<li><h4>A0: Pronouns</h4></li>
<li class="">
<a>bb</a>
<a>cc</a>
</li>
<li class="">
<a>dd</a>
<a>ee</a>
</li>
<li><h4>A0: Verbs Tenses & Conjugation</h4></li>
<li class="">
<a>ff</a>
<a>gg</a>
</li>
<li class="">
<a>hh</a>
<a>kk</a>
</li>
<li class="">
<a>jj</a>
<a>ii</a>
</li>
For each element <li class=""><a>
, I would like to find its nearest above sibling <li><h4>
. For example,
<li class=""><a>bb</a></li>
corresponds to <li><h4>A0: Pronouns</h4></li>
.
<li class=""><a>dd</a></li>
corresponds to <li><h4>A0: Pronouns</h4></li>
.
<li class="">ff<a>dd</a></li>
corresponds to <li><h4>A0: Verbs Tenses & Conjugation</h4></li>
.
<li class="">hh<a>dd</a></li>
corresponds to <li><h4>A0: Verbs Tenses & Conjugation</h4></li>
.
<li class="">jj<a>dd</a></li>
corresponds to <li><h4>A0: Verbs Tenses & Conjugation</h4></li>
.
Could you please elaborate how to do so?
import requests
from bs4 import BeautifulSoup
session = requests.Session()
headers = {
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
}
link = 'https://french.kwiziq.com/revision/grammar'
r = session.get(link, headers = headers)
soup = BeautifulSoup(r.content, 'html.parser')
for d in soup.select('.callout-body > ul li > a:nth-of-type(1)'):
print(d)
Upvotes: 1
Views: 34
Reputation: 71461
You can use :is
in your CSS path:
from bs4 import BeautifulSoup as soup
from collections import defaultdict
d, l = defaultdict(list), None
for i in soup1.select('li > :is(a, h4):nth-of-type(1)'):
if i.name == 'h4':
l = i.get_text(strip=True)
else:
d[l].append(i.get_text(strip=True))
print(dict(d))
Output:
{'A0: Pronouns': ['bb', 'dd'], 'A0: Verbs Tenses & Conjugation': ['ff', 'hh', 'jj']}
The output is storing the first a
for every li
associated with a grammatical section. If you only want a 1-1
section to component result, you can use a dictionary comprehension:
new_d = {a:b for a, (b, *_) in d.items()}
Output:
{'A0: Pronouns': 'bb', 'A0: Verbs Tenses & Conjugation': 'ff'}
Upvotes: 1
Reputation: 195553
You can use .find_previous('h4')
:
import requests
from bs4 import BeautifulSoup
url = "https://french.kwiziq.com/revision/grammar"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select(".callout li > a:nth-of-type(1)"):
print(
"{:<70} {}".format(
a.get_text(strip=True), a.find_previous("h4").get_text(strip=True)
)
)
Prints:
Saying your name: Je m'appelle, Tu t'appelles, Vous vous appelez A0: Pronouns
Tu and vous are used for three types of you A0: Pronouns
Je becomes j' with verbs beginning with a vowel (elision) A0: Verbs Tenses & Conjugation
J'habite à [city] = I live in [city] A0: Idioms, Idiomatic Usage, and Structures
Je viens de + [city] = I'm from + [city] A0: Idioms, Idiomatic Usage, and Structures
Conjugate être (je suis, tu es, vous êtes) in Le Présent (present tense) A0: Verbs Tenses & Conjugation
Make most adjectives feminine by adding -e A0: Adjectives & Adverbs
Nationalities differ depending on whether you're a man or a woman (adjectives) A0: Adjectives & Adverbs
Conjugate avoir (j'ai, tu as, vous avez) in Le Présent (present tense) A0: Verbs Tenses & Conjugation
Using un, une to say "a" (indefinite articles) A0: Nouns & Articles
...
French vocabulary and grammar lists by theme C1: Idioms, Idiomatic Usage, and Structures
French Fill-in-the-Blanks Tests C1: Idioms, Idiomatic Usage, and Structures
Upvotes: 1