thepule
thepule

Reputation: 1751

How to dynamically refer to dataframes in a for loop in Python

I am very new to python and it's probably a simple question, but I cannot seem to find a solution.

I have several pandas data frames with names going like: output_1, output_2, ..., output_n

I want to sum their lengths (as in the number of their rows) and I came up with something like this:

sum =0
for num in range(1,n):
    nameframe="output_"+str(num)
    sum+=nameframe.shape[0]

The problem is that Python sees nameframe as a string, not as the name of a dataframe.

Looking around I found a potential solution:

sum =0
for num in range(1,n):
    x = globals()["output_urls_%s" % num] 
    sum+=x.shape[0]

This seems to work, however the usage of globals() seem to be very discouraged. Therefore, what is the most pythonic way to achieve my purpose?

Upvotes: 0

Views: 2152

Answers (1)

chrisb
chrisb

Reputation: 52266

The most pythonic way would probably be to store your dataframes in a list. E.g.,

dfs = [output_1, output_2, ...]
df_length = sum(x.shape[0] for x in dfs)

Alternatively, you could look at storing your data in a combined pandas data structure, assuming they are all related in some way. E.g., if each dataframe is a different group, you could set a MultiIndex on the combined frame, like

df = pd.concat([output_1, output_2, ...], keys=['group_a', 'group_b', ..]) 

Then you could just take the length of the combined frame.

Upvotes: 2

Related Questions