Reputation: 451
I used xpath helper to help me scrapping a table in a login website.
Code:
g=driver.find_element_by_xpath("//table[@id='DataGrid']/tbody").text
print(g)
The result looks like this, data type is "string":
#@5@#*&(
&*(%#IO
!@%&*(O)
2018/02/02 206 MAZDA MAZDA 5 5660-ES 2006 01 1999 70000 white A
2018/02/02 210 BMW 330 9378-W6 2006 01 2996 80000 black C
2018/02/02 211 MITSUBISHI FORTIS ALK-3501 2015 04 1798 100000 white C+
I want to write this string into csv without the first three lines and use comma to separate them otherwise they will all combine together.
Code here:
if "#@5@#*&(" in g and "&*(%#IO" in g and "!@%&*(O)" in g:
g=g.replace("#@5@#*&(", "")
g=g.replace("&*(%#IO", "")
g=g.replace("!@%&*(O)", "")
g=g.replace(' ', ',')
print(g)
file_name="C:/Test.csv"
with open(file_name,'a') as file:
file.write(g+'\n')
What bothered me is that I don't know how to delete the first three lines. I replace them with blank space, but they are still there, everytime when I write into csv, they all take place. Second is that, when I separate them with comma, there were some errors. Like Mazda 5, it should not be separated. Is there any good way to solve this problem? or should I just correct it in csv file?
source code:
<tr align="left" style="height:40px;">
<td>2018/02/02</td>
<td>206</td>
<td>MAZDA</td>
<td>MAZDA 5</td>
<td>5660-ES</td>
<td>2006</td>
<td>01</td>
<td>1999</td>
<td>70000</td>
<td>white</td>
<td align="center" valign="middle"></td>
<td>A</td>
</tr>
Upvotes: 0
Views: 610
Reputation: 1213
When it comes to removing the first 3 lines, you could either:
"#@5@#*&(\n"
); or"\n".join(g.split("\n")[3:])
The second issue is much harder, because by saving all the content of tbody into one variable, you effectively lost the information about separators. Now you have no way to know whether the space was originally there or is just a separator added automatically. I'd suggest scraping each td
cell individually.
Upvotes: 1
Reputation: 11
To remove the first few lines from a string, just figure out the position of the first relevant piece of info.
temp = "adknsad"
temp[2:]
would output something like "knsad"
It should be the same for the piece of string you have.
I don't think there is any simple way to solve the Mazda 5 thing.
Upvotes: 1