arkadiy
arkadiy

Reputation: 766

How to Iterate through a list of dataframes in Pandas?

I have the following dataframes, combined into a list:

df = pd.DataFrame({'numbers': [1, 2, 3], 'colors': ['red', 'white', 'blue']})
df1 = pd.DataFrame({'numbers': [7, 44, 93], 'colors': ['red', 'white', 'blue']})

df_list = [df,df1]

I would like to use a for loop, to iterate through them, and print each number. For this, I tried:

for num in df_list.numbers.unique():
    val = locals()[num]
    print(val)

But get an error:

AttributeError: 'list' object has no attribute 'numbers'

I also tried, more simply:

for num in df_list.numbers.unique():
    print(num)

But get an error:

AttributeError: 'list' object has no attribute 'numbers'

Similar questions were asked, without satisfactory responses.

Upvotes: 2

Views: 9726

Answers (2)

Niko Fohr
Niko Fohr

Reputation: 33770

Option A: Iterating over a value from list of dataframes

Since you have two dataframes you will have to

  • Iterate throught the dataframes one by one
  • Then, for each dataframe (df_tmp), iterate over all the unique numbers
In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'numbers': [1, 2, 3], 'colors': ['red', 'white', 'blue']})
   ...: df1 = pd.DataFrame({'numbers': [7, 44, 93], 'colors': ['red', 'white', 'blue']})

In [3]: df_list = [df,df1]

In [4]: for df_tmp in df_list:
   ...:     for num in df_tmp['numbers'].unique():
   ...:         print(num)
   ...:
1
2
3
7
44
93

Note: Using this approach, the values will not necessarily be unique! (for example if you have 2 in both, df['numbers'] and df1['numbers'], it would be printed twice).

Option B: Merging the dataframes before iterating

Sometimes it might be more useful to create another dataframe which has all your dataframes combined. You can do it with pd.concat1 like this:

In [17]: df_new = pd.concat(df_list)

In [18]: df_new
Out[18]:
   numbers colors
0        1    red
1        2  white
2        3   blue
0        7    red
1       44  white
2       93   blue

Then, you could iterate over all the unique elements in 'numbers' by simply:

In [19]: for num in df_new['numbers'].unique():
    ...:     print(num)
    ...:
1
2
3
7
44
93
  • This will guarantee that the numbers are unique. The down side is that if all you need to do is to iterate over unique elements of a column that exists in multiple dataframes, creating a new dataframe is a bit of overhead. This brings be to option C.

Option C: Iterating over just the unique values

  • If all you want to do is to iterate over the unique elements in one column which exists in multiple dataframes, you do not need a temporary dataframe for that. Instead, you can achieve the same thing by just taking the union of sets of the elements
# or: nums = set().union(*(map(lambda x:set(x['numbers']), (df, df1))))
In [30]: nums = set().union(*(set(df_['numbers']) for df_ in (df, df1)))

In [31]: nums
Out[31]: {1, 2, 3, 7, 44, 93}

In [32]: for num in nums:
    ...:     print(num)
    ...:
1
2
3
7
44
93


1 The pd.concat() takes an iterable (for example, a list, tuple or generator) as the first argument and returns a brand new dataframe which you can use.

Upvotes: 6

Rachel Shalom
Rachel Shalom

Reputation: 419

you are trying to iterate over the list. the list items are dfs not numbers. you should concat the dfs:

dfs_list=pd.concat(df_list)

dfs_list lokks like this:

    numbers colors
  0  1      red
  1  2      white
  2  3      blue
  0  7      red
  1  44    white
  2  93     blue

and now the loop will work:

for num in dfs_list.numbers.unique():
print(num)
1
2
3
7
etc...

Upvotes: 2

Related Questions