Reputation: 581
my goal here is to concatenate multiple pandas dataframes into a single dataframe in each iteration. I am grabbing a table and creating dataframes with it. here is the commented code.
def visit_table_links():
links = grab_initial_links()
df_final = None
for obi in links:
resp = requests.get(obi[1])
tree = html.fromstring(resp.content)
dflist = []
for attr in tree.xpath('//th[contains(normalize-space(text()), "sometext")]/ancestor::table/tbody/tr'):
population = attr.xpath('normalize-space(string(.//td[2]))')
try:
population = population.replace(',', '')
population = int(population)
year = attr.xpath('normalize-space(string(.//td[1]))')
year = re.findall(r'\d+', year)
year = ''.join(year)
year = int(year)
#appending a to a list, 3 values first two integer last is string
dflist.append([year, population, obi[0]])
except Exception as e:
pass
#creating a dataframe which works fine
df = pd.DataFrame(dflist, columns = ['Year', 'Population', 'Municipality'])
#first time df_final is none so just make first df = df_final
#next time df_final is previous dataframe so concat with the new one
if df_final != None:
df_final = pd.concat(df_final, df)
else:
df_final = df
visit_table_links()
here is the dataframes that are coming
1st dataframe
Year Population Municipality
0 1970 10193 Cape Coral
1 1980 32103 Cape Coral
2 1990 74991 Cape Coral
3 2000 102286 Cape Coral
4 2010 154305 Cape Coral
5 2018 189343 Cape Coral
2nd dataframe
Year Population Municipality
0 1900 383 Clearwater
1 1910 1171 Clearwater
2 1920 2427 Clearwater
3 1930 7607 Clearwater
4 1940 10136 Clearwater
5 1950 15581 Clearwater
6 1960 34653 Clearwater
7 1970 52074 Clearwater
8 1980 85170 Clearwater
9 1990 98669 Clearwater
10 2000 108787 Clearwater
11 2010 107685 Clearwater
12 2018 116478 Clearwater
Trying to concat them results in this error
ValueError Traceback (most recent call last)
<ipython-input-93-429ad4d9bce8> in <module>
75
76
---> 77 visit_table_links()
78
79
<ipython-input-93-429ad4d9bce8> in visit_table_links()
62 print(df)
63
---> 64 if df_final != None:
65 df_final = pd.concat(df_final, df)
66 else:
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __nonzero__(self)
1476 raise ValueError("The truth value of a {0} is ambiguous. "
1477 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478 .format(self.__class__.__name__))
1479
1480 __bool__ = __nonzero__
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have searched a lot of threads and exhausted my resources, i'm new to pandas and not understanding why this is happening,
First i thought it was because of duplicate indexes then i made uuid.uuid4.int() as index
using df.set_index('ID', drop=True, inplace=True)
still the same error.
any guidance will be very helpful, thanks.
EDIT: 1
Sorry for not being clear the error is generating from
df_final = pd.concat(df_final, df)
when i try to concat current dataframe with previous dataframe
Edit 2:
passed the arguments as a list
df_final = pd.concat([df_final, df])
still same error
Upvotes: 0
Views: 532
Reputation: 581
From Sajan's suggetion of len(df_final) == 0
i had an idea that does it make a difference if i set the df_final value to None initially or an empty dataframe with the same columns ?
turns out yes
here is the new code
def visit_table_links():
links = grab_initial_links()
df_final = pd.DataFrame(columns=['Year', 'Population', 'Municipality'])
for obi in links:
resp = requests.get(obi[1])
tree = html.fromstring(resp.content)
dflist = []
for attr in tree.xpath('//th[contains(normalize-space(text()), "sometext")]/ancestor::table/tbody/tr'):
population = attr.xpath('normalize-space(string(.//td[2]))')
try:
population = population.replace(',', '')
population = int(population)
year = attr.xpath('normalize-space(string(.//td[1]))')
year = re.findall(r'\d+', year)
year = ''.join(year)
year = int(year)
dflist.append([year, population, obi[0]])
except Exception as e:
pass
df = pd.DataFrame(dflist, columns = ['Year', 'Population', 'Municipality'])
df_final = pd.concat([df_final, df])
visit_table_links()
For some reason setting df_final = None
makes pandas throw that error
even though in the first iteration i assigning df_final = df
when df_final
is none
so in the next iteration it should not matter what initially df_final
was
for some reason it does matter
so this line df_final = pd.DataFrame(columns=['Year', 'Population', 'Municipality'])
insted of this df_final = None
fixed the issue.
here is the merged dataframe
Year Population Municipality
0 1970 10193 Cape Coral
1 1980 32103 Cape Coral
2 1990 74991 Cape Coral
3 2000 102286 Cape Coral
4 2010 154305 Cape Coral
5 2018 189343 Cape Coral
0 1900 383 Clearwater
1 1910 1171 Clearwater
2 1920 2427 Clearwater
3 1930 7607 Clearwater
4 1940 10136 Clearwater
5 1950 15581 Clearwater
6 1960 34653 Clearwater
7 1970 52074 Clearwater
8 1980 85170 Clearwater
9 1990 98669 Clearwater
10 2000 108787 Clearwater
11 2010 107685 Clearwater
12 2018 116478 Clearwater
0 1970 1489 Coral Springs
1 1980 37349 Coral Springs
2 1990 79443 Coral Springs
3 2000 117549 Coral Springs
4 2010 121096 Coral Springs
5 2018 133507 Coral Springs
Upvotes: 0
Reputation: 1267
Instead of df_final != None
, try using len(df_final) == 0
.
Also, in the pd.concat
command, try passing the arguments as a list i.e. df_final = pd.concat([df_final, df])
Upvotes: 1