Reputation: 61
I am trying to merge multiple columns where after one column the following column starts in a specific index. for example, as you can see in the code below, I have 15 sets of data from df20 to df90. As seen in the code, I have merge the data i and then followed by another starting from index = 1,000.
So I wanted my output to be df20 followed by df25 starting at index=1000, then followed by df30 starting at index=2000, then followed by df35 at index=3000. I wanted to see all 15 columns but I only have one column in my output.
I have tried it below, but doesn't seem to work. Please help.
dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]
for i in dframe:
a = i.merge((i).set_index((i).index+1000), how='outer', left_index=True, right_index=True)
print(a)
Output:
df90_x df90_y
0 0.000757 NaN
1 0.001435 NaN
2 0.002011 NaN
3 0.002497 NaN
4 0.001723 NaN
... ... ...
10995 NaN 1.223000e-12
10996 NaN 1.305000e-12
10997 NaN 1.809000e-12
10998 NaN 2.075000e-12
10999 NaN 2.668000e-12
[11000 rows x 2 columns]
Expected Output:
df20 df25 df30
0 0.000757 0 0
1 0.001435 0 0
2 0.002011 0 0
3 0.002497 0 0
4 0.001723 0 0
... ... ... ...
1000 1.223000e-12 0
1001 1.305000e-12 0
1002 1.809000e-12 0
1003 2.668000e-12 0
... ...
2000 0.1234
2001 0.4567
2002 0.8901
2003 0.2345
Upvotes: 0
Views: 3179
Reputation: 2188
please refer to official page.
df1=pd.DataFrame(
{
"A":["A0","A1","A2","A3"]
},
index=[0, 1, 2, 3]
)
df2=pd.DataFrame(
{
"B":["B4","B5"]
},
index=[4, 5]
)
df3=pd.DataFrame(
{
"C":["C6", "C7", "C8", "C9", "C10"]
},
index=[6, 7, 8, 9, 10]
)
result = pd.concat([df1, df2, df3], axis=1)
display(result)
Output:
A B C
0 A0 NaN NaN
1 A1 NaN NaN
2 A2 NaN NaN
3 A3 NaN NaN
4 NaN B4 NaN
5 NaN B5 NaN
6 NaN NaN C6
7 NaN NaN C7
8 NaN NaN C8
9 NaN NaN C9
10 NaN NaN C10
method 1: you can create a list to put whole filenames into a list
filenames = ['sample_20.csv', 'sample_25.csv', 'sample_30.csv', ...]
dataframes = [pd.read_csv(f) for f in filenames]
method 1-1: If you do have lots of files then you need a faster way to create the name list
filenames = ['sample_{}.csv'.format(i) for i in range(20, 90, 5)]
dataframes = [pd.read_csv(f) for f in filenames]
method 2:
from glob import glob
filenames = glob('sample*.csv')
dataframes = [pd.read_csv(f) for f in filenames]
Upvotes: 1
Reputation: 24049
you can try this code, if you want variable for num_dataframe
, length_dataframe
:
import pandas as pd
import random
dframe = list()
num_dataframe = 3
len_dataframe = 5
for i in range((num_dataframe)):
dframe.append(pd.DataFrame({i:[random.randrange(1, 50, 1) for i in range(len_dataframe)]},
index=range(i*len_dataframe, (i+1)*len_dataframe)))
result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)
result.fillna(0)
output:
and for your question, you want 20 data frame with 1000 length, you can try this:
import pandas as pd
import random
dframe = list()
num_dataframe = 20
len_dataframe = 1000
for i in range((num_dataframe)):
dframe.append(pd.DataFrame({i:[np.random.random() for i in range(len_dataframe)]},
index=range(i*len_dataframe, (i+1)*len_dataframe)))
result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)
result.fillna(0)
output:
as you mentioned in the comment, I edit the post and add this code:
dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]
result = pd.concat([dframe[i] for i in range(len(dframe))], axis=0)
result.fillna(0)
Upvotes: 1