How to extract a table from a website(url) using python

Question

The NIST dataset website contains some data of copper, how can I grab the table in the left (titled “HTML table format “) from the website using a script of python. And only perverse the numbers in the second and third columns as shown in picture below. And store all data into a .csv file. I tried codes below, but it failed to get the correct format of the table.

import pandas as pd

# URL of the table
url = "https://physics.nist.gov/PhysRefData/XrayMassCoef/ElemTab/z29.html"

# Read the table into a pandas dataframe
df = pd.read_html(url, header=0, index_col=0)[0]
# Save the processed table to a CSV file
df.to_csv("nist_table.csv", index=False)

HedgeHog · Accepted Answer

You could use:

.droplevel([0,1]) to remove the unwanted header rows
.dropna(axis=1, how='all') to remove the empty columns
.iloc[:,1:] to select only specific 3 columns

Example

import pandas as pd
url = "https://physics.nist.gov/PhysRefData/XrayMassCoef/ElemTab/z29.html"

df = pd.read_html(url, header=[0,1,2,3])[1].droplevel([0,1], axis=1).dropna(axis=1, how='all').iloc[:,1:]
df

How to extract a table from a website(url) using python

Answers (2)

Example

Related Questions