ARJ
ARJ

Reputation: 2080

Python for merging multiple files from a directory into one single file

I need a single file with many columns(=number of files in the directory), from multiple file in the directory.. Each files has unique IDs which will not change for all files and so I need to merge these files based on that id.

For example, file_1 looks like this

id      pool1
ABL1    1352
ABL12   1236
ABL13   1022
ABL14   815
ABL15   1591
ABL16   2703

And so as the other files the first column is same for all other files in the directory and second columns are different.

I am looking for a output which looks something like this,

 id /pool1  /pool2  /pool3  /pool4  /pool5
ABL1    1352    1353    1354    1355    1356
ABL12   1236    1237    1238    1239    1240
ABL13   1022    1023    1024    1025    1026
ABL14   815      816    817      818    819
ABL15   1591    1592    1593    1594    1595
ABL16   2703    2704    2705    2706    2707
ABL17   1449    1450    1451    1452    1453
ABL18   619     620     621      622    623
ABL19   1074    1075    1076    1077    1078

So far I was trying to achieve it in python via following scripts,

path = '/Pool1' 
files = os.listdir(path)

files_txt  = [i for i in files if i.endswith('.txt_samplecount')]
files_merge= i for i in files_txt if i.merge(i,on="id") 

But it throws error as
AttributeError: 'str' object has no attribute 'merge'

Any help or suggestions are welcome

Thank you

Upvotes: 0

Views: 2054

Answers (1)

ARJ
ARJ

Reputation: 2080

I found a solution ,

 path = '/Pool1' 
files = os.listdir(path)

files_txt  = [os.path.join(path,i) for i in files if i.endswith('.txt_samplecount')]

## Change it into dataframe
dfs = [pd.DataFrame.from_csv(x, sep='\t') for x in files_txt]
##Concatenate it
merged = pd.concat(dfs, axis=1)

And this gives a output with each columns concatenate to the single file. Thanks for suggestions all

Upvotes: 2

Related Questions