Reputation: 2080
I need a single file with many columns(=number of files in the directory), from multiple file in the directory.. Each files has unique IDs which will not change for all files and so I need to merge these files based on that id.
For example, file_1 looks like this
id pool1
ABL1 1352
ABL12 1236
ABL13 1022
ABL14 815
ABL15 1591
ABL16 2703
And so as the other files the first column is same for all other files in the directory and second columns are different.
I am looking for a output which looks something like this,
id /pool1 /pool2 /pool3 /pool4 /pool5
ABL1 1352 1353 1354 1355 1356
ABL12 1236 1237 1238 1239 1240
ABL13 1022 1023 1024 1025 1026
ABL14 815 816 817 818 819
ABL15 1591 1592 1593 1594 1595
ABL16 2703 2704 2705 2706 2707
ABL17 1449 1450 1451 1452 1453
ABL18 619 620 621 622 623
ABL19 1074 1075 1076 1077 1078
So far I was trying to achieve it in python via following scripts,
path = '/Pool1'
files = os.listdir(path)
files_txt = [i for i in files if i.endswith('.txt_samplecount')]
files_merge= i for i in files_txt if i.merge(i,on="id")
But it throws error as
AttributeError: 'str' object has no attribute 'merge'
Any help or suggestions are welcome
Thank you
Upvotes: 0
Views: 2054
Reputation: 2080
I found a solution ,
path = '/Pool1'
files = os.listdir(path)
files_txt = [os.path.join(path,i) for i in files if i.endswith('.txt_samplecount')]
## Change it into dataframe
dfs = [pd.DataFrame.from_csv(x, sep='\t') for x in files_txt]
##Concatenate it
merged = pd.concat(dfs, axis=1)
And this gives a output with each columns concatenate to the single file. Thanks for suggestions all
Upvotes: 2