Shantanu Bedajna
Shantanu Bedajna

Reputation: 581

Pandas concatinating dataframes results in DataFrame is ambiguous

my goal here is to concatenate multiple pandas dataframes into a single dataframe in each iteration. I am grabbing a table and creating dataframes with it. here is the commented code.

def visit_table_links():
    links = grab_initial_links()

    df_final = None
    for obi in links:

        resp = requests.get(obi[1])
        tree = html.fromstring(resp.content)

        dflist = []

        for attr in tree.xpath('//th[contains(normalize-space(text()),  "sometext")]/ancestor::table/tbody/tr'):
            population = attr.xpath('normalize-space(string(.//td[2]))')
            try:
                population = population.replace(',', '')
                population = int(population)
                year = attr.xpath('normalize-space(string(.//td[1]))')
                year = re.findall(r'\d+', year)
                year = ''.join(year)
                year = int(year)


                #appending a to a list, 3 values first two integer last is string
                dflist.append([year, population, obi[0]])

            except Exception as e:
                pass

        #creating a dataframe which works fine

        df = pd.DataFrame(dflist, columns = ['Year', 'Population', 'Municipality'])

        #first time df_final is none so just make first df = df_final
        #next time df_final is previous dataframe so concat with the new one

        if df_final != None:
            df_final = pd.concat(df_final, df)
        else:

            df_final = df


visit_table_links()

here is the dataframes that are coming

1st dataframe

   Year  Population Municipality
0  1970       10193   Cape Coral
1  1980       32103   Cape Coral
2  1990       74991   Cape Coral
3  2000      102286   Cape Coral
4  2010      154305   Cape Coral
5  2018      189343   Cape Coral

2nd dataframe

    Year  Population Municipality
0   1900         383   Clearwater
1   1910        1171   Clearwater
2   1920        2427   Clearwater
3   1930        7607   Clearwater
4   1940       10136   Clearwater
5   1950       15581   Clearwater
6   1960       34653   Clearwater
7   1970       52074   Clearwater
8   1980       85170   Clearwater
9   1990       98669   Clearwater
10  2000      108787   Clearwater
11  2010      107685   Clearwater
12  2018      116478   Clearwater

Trying to concat them results in this error

ValueError                                Traceback (most recent call last)
<ipython-input-93-429ad4d9bce8> in <module>
     75 
     76 
---> 77 visit_table_links()
     78 
     79 

<ipython-input-93-429ad4d9bce8> in visit_table_links()
     62         print(df)
     63 
---> 64         if df_final != None:
     65             df_final = pd.concat(df_final, df)
     66         else:

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __nonzero__(self)
   1476         raise ValueError("The truth value of a {0} is ambiguous. "
   1477                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478                          .format(self.__class__.__name__))
   1479 
   1480     __bool__ = __nonzero__

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have searched a lot of threads and exhausted my resources, i'm new to pandas and not understanding why this is happening,

First i thought it was because of duplicate indexes then i made uuid.uuid4.int() as index using df.set_index('ID', drop=True, inplace=True) still the same error.

any guidance will be very helpful, thanks.

EDIT: 1

Sorry for not being clear the error is generating from

df_final = pd.concat(df_final, df)

when i try to concat current dataframe with previous dataframe

Edit 2:

passed the arguments as a list

df_final = pd.concat([df_final, df])

still same error

Upvotes: 0

Views: 532

Answers (2)

Shantanu Bedajna
Shantanu Bedajna

Reputation: 581

From Sajan's suggetion of len(df_final) == 0

i had an idea that does it make a difference if i set the df_final value to None initially or an empty dataframe with the same columns ?

turns out yes

here is the new code

def visit_table_links():
    links = grab_initial_links()

    df_final = pd.DataFrame(columns=['Year', 'Population', 'Municipality'])
    for obi in links:
        resp = requests.get(obi[1])
        tree = html.fromstring(resp.content)

        dflist = []

        for attr in tree.xpath('//th[contains(normalize-space(text()),  "sometext")]/ancestor::table/tbody/tr'):
            population = attr.xpath('normalize-space(string(.//td[2]))')
            try:
                population = population.replace(',', '')
                population = int(population)
                year = attr.xpath('normalize-space(string(.//td[1]))')
                year = re.findall(r'\d+', year)
                year = ''.join(year)
                year = int(year)

                dflist.append([year, population, obi[0]])

            except Exception as e:
                pass

        df = pd.DataFrame(dflist, columns = ['Year', 'Population', 'Municipality'])

        df_final = pd.concat([df_final, df])

visit_table_links()

For some reason setting df_final = None makes pandas throw that error even though in the first iteration i assigning df_final = df when df_final is none

so in the next iteration it should not matter what initially df_final was

for some reason it does matter

so this line df_final = pd.DataFrame(columns=['Year', 'Population', 'Municipality']) insted of this df_final = None fixed the issue.

here is the merged dataframe

    Year Population   Municipality
0   1970      10193     Cape Coral
1   1980      32103     Cape Coral
2   1990      74991     Cape Coral
3   2000     102286     Cape Coral
4   2010     154305     Cape Coral
5   2018     189343     Cape Coral
0   1900        383     Clearwater
1   1910       1171     Clearwater
2   1920       2427     Clearwater
3   1930       7607     Clearwater
4   1940      10136     Clearwater
5   1950      15581     Clearwater
6   1960      34653     Clearwater
7   1970      52074     Clearwater
8   1980      85170     Clearwater
9   1990      98669     Clearwater
10  2000     108787     Clearwater
11  2010     107685     Clearwater
12  2018     116478     Clearwater
0   1970       1489  Coral Springs
1   1980      37349  Coral Springs
2   1990      79443  Coral Springs
3   2000     117549  Coral Springs
4   2010     121096  Coral Springs
5   2018     133507  Coral Springs

Upvotes: 0

Sajan
Sajan

Reputation: 1267

Instead of df_final != None, try using len(df_final) == 0.

Also, in the pd.concat command, try passing the arguments as a list i.e. df_final = pd.concat([df_final, df])

Upvotes: 1

Related Questions