Reputation: 41
I'm trying to extract data from website URL
The table has a span tag which is messing the data extraction, the table value is concatenated with the span tag, I want to extract both the cell content and span tag in separate cells, any help would be greatly appreciated
Here is the code
import pandas as pd
url = "https://www.sqimway.com/lte_band.php"
lte_band = pd.read_html(url)
lte_band[0]
Upvotes: 1
Views: 148
Reputation: 2348
If you have pandas 0.24+, you can use pandas.MultiIndex.to_flat_index() and then map out unique values to each column name.
# Set a new DataFrame variable.
df = lte_band[0]
# Note: We will have to sort on the tuple index to retain order.
df.columns = list(map(lambda q: " ".join(sorted(set(q), key = q.index)), df.columns.to_flat_index()))
Output of df.columns
:
Index(['Band', 'Name', 'Mode', 'Downlink (MHz) Low Earfcn',
'Downlink (MHz) Middle Earfcn', 'Downlink (MHz) High Earfcn',
'BandwidthDL/UL (MHz)', 'Uplink (MHz) Low Earfcn',
'Uplink (MHz) Middle Earfcn', 'Uplink (MHz) High Earfcn',
'Duplex spacing(MHz)', 'Geographicalarea', '3GPPrelease',
'Channel bandwidth (MHz) 1.4', 'Channel bandwidth (MHz) 3',
'Channel bandwidth (MHz) 5', 'Channel bandwidth (MHz) 10',
'Channel bandwidth (MHz) 15', 'Channel bandwidth (MHz) 20'],
dtype='object')
Formatted:
Band
Name
Mode
Downlink (MHz) Low Earfcn
Downlink (MHz) Middle Earfcn
Downlink (MHz) High Earfcn
BandwidthDL/UL (MHz)
Uplink (MHz) Low Earfcn
Uplink (MHz) Middle Earfcn
Uplink (MHz) High Earfcn
Duplex spacing(MHz)
Geographicalarea
3GPPrelease
Channel bandwidth (MHz) 1.4
Channel bandwidth (MHz) 3
Channel bandwidth (MHz) 5
Channel bandwidth (MHz) 10
Channel bandwidth (MHz) 15
Channel bandwidth (MHz) 20
Upvotes: 1