Reputation: 13
I've written the following code is scraping the tables from http://acuratings.conservative.org/acu-federal-legislative-ratings/?year1=1975&chamber=11&state1=0&sortable=1. The goal is to save all the tables into one dataframe
import pandas as pd
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
acu_browser = webdriver.Chrome(ChromeDriverManager().install())
acu_browser.get('http://acuratings.conservative.org/acu-federal-legislative-ratings/?year1=1975&chamber=11&state1=0&sortable=1').
time.sleep(10)
acu_html = acu_browser.page_source
acu_tables = pd.read_html(acu_html)
acu_tables = pd.concat(acu_tables)
However, the last line is giving me the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-16e0df40412a> in <module>
13 acu_html = acu_browser.page_source
14 acu_tables = pd.read_html(acu_html)
---> 15 acu_tables = pd.concat(acu_tables)
16
/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
282 )
283
--> 284 return op.get_result()
285
286
/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in get_result(self)
490 obj_labels = mgr.axes[ax]
491 if not new_labels.equals(obj_labels):
--> 492 indexers[ax] = obj_labels.reindex(new_labels)[1]
493
494 mgrs_indexers.append((obj._data, indexers))
/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in reindex(self, target, method, level, limit, tolerance)
2423 else:
2424 # hopefully?
-> 2425 target = MultiIndex.from_tuples(target)
2426
2427 if (
/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
487 tuples = tuples._values
488
--> 489 arrays = list(lib.tuples_to_object_array(tuples).T)
490 elif isinstance(tuples, list):
491 arrays = list(lib.to_object_array_tuples(tuples).T)
pandas/_libs/lib.pyx in pandas._libs.lib.tuples_to_object_array()
TypeError: Expected tuple, got str
Any help will be really appreciated.
Upvotes: 0
Views: 2337
Reputation: 681
I don't have a good answer to this as of now.
One hacky way around this would be to do something like the following:
accumulator_df = acu_tables[1]
for i in range(2, len(acu_tables)):
accumulator_df = pd.concat((accumulator_df, acu_tables[i]), ignore_index = True)
However, this won't work directly. Since the column names are not the same, its not able to concat properly.
Since all the tables have 35 columns, one way around this would be to simply rename the columns to some fixed values and then concat.
Upvotes: 1