Reputation: 1751
I am very new to python and it's probably a simple question, but I cannot seem to find a solution.
I have several pandas data frames with names going like: output_1, output_2, ..., output_n
I want to sum their lengths (as in the number of their rows) and I came up with something like this:
sum =0
for num in range(1,n):
nameframe="output_"+str(num)
sum+=nameframe.shape[0]
The problem is that Python sees nameframe as a string, not as the name of a dataframe.
Looking around I found a potential solution:
sum =0
for num in range(1,n):
x = globals()["output_urls_%s" % num]
sum+=x.shape[0]
This seems to work, however the usage of globals() seem to be very discouraged. Therefore, what is the most pythonic way to achieve my purpose?
Upvotes: 0
Views: 2152
Reputation: 52266
The most pythonic way would probably be to store your dataframes in a list. E.g.,
dfs = [output_1, output_2, ...]
df_length = sum(x.shape[0] for x in dfs)
Alternatively, you could look at storing your data in a combined pandas data structure, assuming they are all related in some way. E.g., if each dataframe is a different group, you could set a MultiIndex on the combined frame, like
df = pd.concat([output_1, output_2, ...], keys=['group_a', 'group_b', ..])
Then you could just take the length of the combined frame.
Upvotes: 2