Kim Yejun
Kim Yejun

Reputation: 61

How to combine multiple dataframes using for loop?

I am trying to merge multiple columns where after one column the following column starts in a specific index. for example, as you can see in the code below, I have 15 sets of data from df20 to df90. As seen in the code, I have merge the data i and then followed by another starting from index = 1,000.

So I wanted my output to be df20 followed by df25 starting at index=1000, then followed by df30 starting at index=2000, then followed by df35 at index=3000. I wanted to see all 15 columns but I only have one column in my output.

I have tried it below, but doesn't seem to work. Please help.

dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]
for i in dframe:
  a = i.merge((i).set_index((i).index+1000), how='outer', left_index=True, right_index=True)

print(a)

Output:

                      df90_x              df90_y
0                     0.000757                      NaN
1                     0.001435                      NaN
2                     0.002011                      NaN
3                     0.002497                      NaN
4                     0.001723                      NaN
...                        ...                      ...
10995                      NaN             1.223000e-12
10996                      NaN             1.305000e-12
10997                      NaN             1.809000e-12
10998                      NaN             2.075000e-12
10999                      NaN             2.668000e-12

[11000 rows x 2 columns]

Expected Output:

                      df20                 df25                  df30
0                     0.000757             0                     0
1                     0.001435             0                     0
2                     0.002011             0                     0
3                     0.002497             0                     0
4                     0.001723             0                     0
...                  ...                   ...                   ...
1000                                      1.223000e-12           0
1001                                      1.305000e-12           0
1002                                      1.809000e-12           0
1003                                      2.668000e-12           0
...                                                              ...
2000                                                             0.1234
2001                                                             0.4567
2002                                                             0.8901
2003                                                             0.2345

Upvotes: 0

Views: 3179

Answers (2)

Baron Legendre
Baron Legendre

Reputation: 2188

please refer to official page.


Concat multiple dataframes

df1=pd.DataFrame(
        {
            "A":["A0","A1","A2","A3"]
        },
        index=[0, 1, 2, 3]
)
df2=pd.DataFrame(
        {
            "B":["B4","B5"]
        },
        index=[4, 5]
)
df3=pd.DataFrame(
        {
            "C":["C6", "C7", "C8", "C9", "C10"]
        },
        index=[6, 7, 8, 9, 10]
)
result = pd.concat([df1, df2, df3], axis=1)
display(result)

Output:

      A    B    C
0    A0  NaN  NaN
1    A1  NaN  NaN
2    A2  NaN  NaN
3    A3  NaN  NaN
4   NaN   B4  NaN
5   NaN   B5  NaN
6   NaN  NaN   C6
7   NaN  NaN   C7
8   NaN  NaN   C8
9   NaN  NaN   C9
10  NaN  NaN  C10

Import file into a list via looping

method 1: you can create a list to put whole filenames into a list

filenames = ['sample_20.csv', 'sample_25.csv', 'sample_30.csv', ...]
dataframes = [pd.read_csv(f) for f in filenames]

method 1-1: If you do have lots of files then you need a faster way to create the name list

filenames = ['sample_{}.csv'.format(i) for i in range(20, 90, 5)]
dataframes = [pd.read_csv(f) for f in filenames]

method 2:

from glob import glob
filenames = glob('sample*.csv')
dataframes = [pd.read_csv(f) for f in filenames]

Upvotes: 1

I'mahdi
I'mahdi

Reputation: 24049

you can try this code, if you want variable for num_dataframe , length_dataframe:

import pandas as pd
import random

dframe = list()
num_dataframe = 3
len_dataframe = 5

for i in range((num_dataframe)):
    dframe.append(pd.DataFrame({i:[random.randrange(1, 50, 1) for i in range(len_dataframe)]},
                               index=range(i*len_dataframe, (i+1)*len_dataframe)))


result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)

result.fillna(0)

output:

enter image description here

and for your question, you want 20 data frame with 1000 length, you can try this:

import pandas as pd
import random

dframe = list()
num_dataframe = 20
len_dataframe = 1000

for i in range((num_dataframe)):
    dframe.append(pd.DataFrame({i:[np.random.random() for i in range(len_dataframe)]},
                               index=range(i*len_dataframe, (i+1)*len_dataframe)))


result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)

result.fillna(0)

output:

enter image description here

as you mentioned in the comment, I edit the post and add this code:

dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]

result = pd.concat([dframe[i] for i in range(len(dframe))], axis=0)

result.fillna(0)

Upvotes: 1

Related Questions