seanofdead
seanofdead

Reputation: 99

Cleaning Scraped Table Data

all I'm attempting to pull table data from a site that is displayed from a search box. I'm having trouble cleaning up the data in excel. Are there any tricks to get this a little neater? here is the code and screenshot below.

I need it to display the headers of Case Number, Style/Defendant, File Date and then the case numbers below, description of the defendant case, and file date.

soup = BeautifulSoup(driver.page_source,'html.parser')  
df = pd.read_html(str(soup))[0]
df.dropna(inplace=True)

print(df)
datatoexcel = pd.ExcelWriter('Case_Files.xlsx')

df.to_excel(datatoexcel)

datatoexcel.save()
print('DataFrame is Written to Excel Successfully.')

This is what the table looks like from the print(df) screen

 Unnamed: 0_level_0                                               Name Date of Birth
         Case Number                                  Style / Defendant     File Date
         Case Number                                  Style / Defendant     File Date
2        Case Number                                  Style / Defendant     File Date
3          08A575873  In the Matter of Petition for Compromise of Mi...    11/17/2008
6        Case Number                                  Style / Defendant     File Date
7          08A575874  In the Matter of Petition for Compromise of Mi...    11/17/2008
DataFrame is Written to Excel Successfully.

screenshot of excel file

Upvotes: 1

Views: 45

Answers (1)

Corralien
Corralien

Reputation: 120409

IIUC, drop the first two levels of column index then filter out your rows:

>>> df.droplevel(level=[0, 1], axis=1).query("`Case Number` != 'Case Number'")

  Case Number                                  Style / Defendant   File Date
3   08A575873  In the Matter of Petition for Compromise of Mi...  11/17/2008
7   08A575874  In the Matter of Petition for Compromise of Mi...  11/17/2008

Upvotes: 1

Related Questions