A.D
A.D

Reputation: 135

Iterating over click for the table cells containing the link and finding it by link text while scraping data using selenium and python

I want to save all the scrape data from on click on Survey Number displaying the textarea containing the numbers in the CSV file ..From my code only the last pages having the survey numbers are saved in CSV file .Other data of page1 and page2 not able to save

enter code here

links = driver.find_elements_by_link_text('SurveyNo')
page_num = 2
while True:
      link1 = driver.find_element_by_link_text(str(page_num))
      link1.click()
      soup=BeautifulSoup(driver.page_source, 'lxml')
      table = soup.find("table" , attrs = 
      {'id':'ctl00_ContentPlaceHolder5_grdUrbanSubZoneWiseRate' })
      data = []
      for l in range(len(links)):

           newlinks = driver.find_elements_by_link_text('SurveyNo')
           newlinks[l].click()
           soup = BeautifulSoup(driver.page_source, 'lxml')
           td1 = soup.find("textarea", attrs={'class': 'textbox'})
           data.append(td1.text)
           for data1 in data:
                with open(os.path.expanduser("output.csv"),"w") as result:
                wr = csv.writer(result, dialect='excel')
                wr.writerow(data)
      page_num += 1

Output: Please see the imageOutfile image

Upvotes: 0

Views: 113

Answers (2)

KunduK
KunduK

Reputation: 33384

I have done a bit modification.instead of table I have taken the links count and then iterate the links.Let me know if this work for you.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.select import Select
URL = 'http://www.igrmaharashtra.gov.in/eASR/eASRCommon.aspx?hDistName=Pune'
driver = webdriver.Chrome()
driver.get(URL)
Select(driver.find_element_by_name('ctl00$ContentPlaceHolder5$ddlTaluka')).select_by_value('5')
Select(driver.find_element_by_name('ctl00$ContentPlaceHolder5$ddlVillage')).select_by_value('1853')

links = driver.find_elements_by_link_text('SurveyNo')

for l in range(len(links)):
    newlinks = driver.find_elements_by_link_text('SurveyNo')
    newlinks[l].click()
    soup = BeautifulSoup(driver.page_source, 'lxml')
    td1 = soup.find("textarea", attrs={'class': 'textbox'})
    print(td1.text)

OutPut:

1239 , 1520 , 1524 , 1231 , 1223 , 1339 , 1517 , 1518 , 1521 , 1523 , 1528 , 1540 , 1244 , 1519 , 1522 , 1243 , 1186 , 1187 , 1224 , 1227 , 1234 , 1235 , 1238 , 1241 , 1228 
1646 , 1504 , 1527 , 1532 , 1536 , 1541 , 2471 , 1641 , 2463 , 1691 , 1751 , 1755 , 1770 , 1915 , 1929 , 1634 , 1750 , 1771 , 1769 , 1767 , 1766 , 1754 , 1694 , 1752 , 1913 , 2468 , 1753 , 1914 , 1916 , 1919 , 1928 , 1693 , 2467 , 2028 , 2472 , 2473 , 2474 , 2462 , 1539 , 2464 , 1529 , 1530 , 1531 , 1533 , 1506 , 1534 , 1505 , 1535 , 1649 , 1538 , 1507 , 1547 , 1603 , 1604 , 1635 , 1636 , 1640 , 1642 , 1645 , 1647 , 1648 , 1537 
1620 , 1621 
1501 , 1482 , 1548 , 1544 , 1513 , 1497 , 1681 , 1677 , 1673 , 1552 , 1486 , 1525 , 1478 , 1474 , 1470 , 1466 , 1463 , 1459 , 1493 , 1490 , 1579 , 1669 , 1665 , 1662 , 1658 , 1608 , 1602 , 1599 , 1595 , 1591 , 1509 , 1583 , 1556 , 1575 , 1572 , 1568 , 1564 , 1560 , 1587 , 1574 , 1592 , 1593 , 1594 , 1596 , 1597 , 1598 , 1615 , 1600 , 1601 , 1590 , 1605 , 1606 , 1607 , 1609 , 1610 , 1611 , 1612 , 1613 , 1614 , 1576 , 1667 , 1616 , 1565 , 1566 , 1567 , 1569 , 1570 , 1577 , 1573 , 1589 , 1578 , 1580 , 1581 , 1582 , 1584 , 1585 , 1586 , 1588 , 1571 , 1671 , 1656 , 1657 , 1659 , 1660 , 1661 , 1663 , 1666 , 1664 , 1670 , 1653 , 1672 , 1674 , 1675 , 1676 , 1678 , 1679 , 1680 , 1563 , 1668 , 1633 , 1618 , 1619 , 1622 , 1623 , 1624 , 1625 , 1629 , 1630 , 1655 , 1632 , 1654 , 1638 , 1643 , 1644 , 1457 , 1650 , 1651 , 1652 , 1617 , 1631 , 1467 , 1479 , 1455 , 1456 , 1340 , 1458 , 1461 , 1462 , 1453 , 1465 , 1452 , 1468 , 1469 , 1471 , 1472 , 1473 , 1475 , 1476 , 1477 , 1464 , 1352 , 1562 , 1460 , 1341 , 1344 , 1345 , 1346 , 1348 , 1454 , 1351 , 1349 , 1353 , 1445 , 1446 , 1447 , 1448 , 1449 , 1450 , 1451 , 1350 , 1550 , 1514 , 1515 , 1516 , 1526 , 1542 , 1543 , 1545 , 1512 , 1549 , 1554 , 1551 , 1553 , 1555 , 1558 , 1559 , 1561 , 1480 , 1347 , 1546 , 1488 , 1481 , 1483 , 1557 , 1511 , 1484 , 1485 , 1487 , 1489 , 1491 , 1492 , 1498 , 1503 , 1494 , 1500 , 1502 , 1499 , 1508 , 1496 , 1510 , 1495 
1121 , 1125 , 1129 , 1133 , 1110 , 1083 , 1118 , 1114 , 1106 , 1102 , 1098 , 1094 , 1137 , 1087 , 1225 , 1091 , 1232 , 1289 , 1286 , 1282 , 1278 , 1274 , 1270 , 1266 , 1262 , 1259 , 1255 , 1251 , 1216 , 1240 , 1141 , 1071 , 1220 , 1212 , 1208 , 1204 , 1200 , 1196 , 1193 , 1189 , 1152 , 1148 , 1145 , 1247 , 1745 , 1718 , 1706 , 1710 , 1722 , 1725 , 1729 , 1733 , 1698 , 1741 , 1695 , 1749 , 1758 , 1762 , 1768 , 1714 , 1075 , 1079 , 1737 , 1301 , 1064 , 1702 , 1688 , 1067 , 1060 , 1297 , 1305 , 1309 , 1313 , 1316 , 1320 , 1293 , 1311 , 1302 , 1312 , 1308 , 1307 , 1310 , 1306 , 1299 , 1303 , 1300 , 1315 , 1328 , 1298 , 1296 , 1295 , 1304 , 1327 , 1334 , 1333 , 1264 , 1294 , 1332 , 1331 , 1326 , 1329 , 1318 , 1325 , 1324 , 1323 , 1322 , 1321 , 1319 , 1330 , 1254 , 1267 , 1265 , 1263 , 1261 , 1260 , 1258 , 1269 , 1256 , 1271 , 1253 , 1252 , 1250 , 1249 , 1335 , 1726 , 1257 , 1280 , 1291 , 1290 , 1288 , 1287 , 1285 , 1284 , 1268 , 1281 , 1292 , 1279 , 1277 , 1276 , 1275 , 1273 , 1272 , 1283 , 1314 , 1716 , 1735 , 1734 , 1732 , 1731 , 1730 , 1738 , 1727 , 1739 , 1724 , 1248 , 1721 , 1720 , 1719 , 1723 , 1728 , 1748 , 1765 , 1764 , 1763 , 1761 , 1760 , 1759 , 1736 , 1756 , 1715 , 1747 , 1746 , 1744 , 1743 , 1742 , 1740 , 1757 , 1628 , 1717 , 1686 , 1685 , 1684 , 1683 , 1682 , 1689 , 1637 , 1690 , 1627 , 1626 , 1343 , 1342 , 1338 , 1337 , 1639 , 1703 , 1713 , 1712 , 1711 , 1709 , 1708 , 1707 , 1687 , 1704 , 1336 , 1701 , 1700 , 1699 , 1697 , 1696 , 1692 , 1705 , 1112 , 1147 , 1122 , 1120 , 1119 , 1117 , 1116 , 1124 , 1113 , 1126 , 1111 , 1109 , 1108 , 1107 , 1105 , 1104 , 1115 , 1135 , 1146 , 1144 , 1143 , 1142 , 1140 , 1139 , 1123 , 1136 , 1100 , 1134 , 1132 , 1131 , 1130 , 1128 , 1127 , 1138 , 1062 , 1073 , 1072 , 1070 , 1069 , 1068 , 1066 , 1103 , 1063 , 1077 , 1061 , 1059 , 1057 , 1058 , 1246 , 1317 , 1065 , 1086 , 1090 , 1099 , 1097 , 1096 , 1095 , 1093 , 1074 , 1089 , 1076 , 1085 , 1084 , 1082 , 1081 , 1080 , 1078 , 1101 , 1092 , 1198 , 1209 , 1207 , 1206 , 1205 , 1203 , 1202 , 1188 , 1199 , 1213 , 1197 , 1195 , 1194 , 1192 , 1191 , 1190 , 1201 , 1229 , 1245 , 1242 , 1237 , 1236 , 1088 , 1149 , 1210 , 1230 , 1211 , 1226 , 1222 , 1221 , 1219 , 1218 , 1215 , 1214 , 1233 , 1185 , 1166 , 1165 , 1164 , 1163 , 1162 , 1160 , 1167 , 1157 , 1161 , 1217 , 1156 , 1155 , 1154 , 1153 , 1151 , 1150 , 1158 , 1179 , 1177 , 1175 , 1159 , 1178 , 1176 , 1174 , 1173 , 1169 , 1168 , 1171 , 1181 , 1170 , 1180 , 1172 

Upvotes: 1

C. Peck
C. Peck

Reputation: 3717

I think your mistake is possibly that link returns as a list and then you only tell it to click once. You may need another loop along the lines of

for x in link;
    x.click();
soup = BeautifulSoup(driver.page_source, 'lxml')
....

(and you may want to rename your variable as links so you can say for link in links)

Upvotes: 0

Related Questions