web scraping with python for specific section

Question

I have an array created by using soup findAll, and its first element has the following information. In this list I only need address information, which is
"54000 NANCY 47 RUE SERGENT BLANDAN", how can I get this information?

 {
  "div": {
    "@class": "result-left",
    "h3": "Establishment(s)",
    "div": [
      {
        "label": "Status:",
        "#text": "Closed"
      },
      {
        "p": {
          "label": "Brand name:",
          "#text": "LE ZODIAC"
        }
      },
      {
        "p": {
          "label": "Usual name:"
        }
      },
      {
        "p": {
          "label": "Address:",
          "br": [
            "",
            "54000
											NANCY"
          ],
          "#text": "47
										
										RUE
										SERGENT BLANDAN"
        }
      },
      {
        "p": {
          "label": "Principal activity:",
          "#text": "47.78C - 
											
											Autres commerces de détail spécialisés divers"
        }
      },
      {
        "p": {
          "label": {
            "sup": "*",
            "#text": [
              "Employee numbers",
              ":"
            ]
          }
        }
      },
      {
        "p": {
          "label": "Year employee numbers verified:"
        }
      }
    ]
  }
}

QHarr · Accepted Answer

You can take your string and use re to do some string cleaning after extracting the items of interest. This is particular to your json given

import  re

s = {
  "div": {
    "@class": "result-left",
    "h3": "Establishment(s)",
    "div": [
      {
        "label": "Status:",
        "#text": "Closed"
      },
      {
        "p": {
          "label": "Brand name:",
          "#text": "LE ZODIAC"
        }
      },
      {
        "p": {
          "label": "Usual name:"
        }
      },
      {
        "p": {
          "label": "Address:",
          "br": [
            "",
            "54000
											NANCY"
          ],
          "#text": "47
										
										RUE
										SERGENT BLANDAN"
        }
      },
      {
        "p": {
          "label": "Principal activity:",
          "#text": "47.78C - 
											
											Autres commerces de détail spécialisés divers"
        }
      },
      {
        "p": {
          "label": {
            "sup": "*",
            "#text": [
              "Employee numbers",
              ":"
            ]
          }
        }
      },
      {
        "p": {
          "label": "Year employee numbers verified:"
        }
      }
    ]
  }
}

result =  re.sub(r'
	+',' ',' '.join([s['div']['div'][3]['p']['br'][1], s['div']['div'][3]['p']['#text']]))
print(result)

web scraping with python for specific section

Answers (2)

Related Questions