Gabriela Lima
Gabriela Lima

Reputation: 27

Get values from specific columns in different csv files, concat. and create column based on filename in python

I have multiple (more than 100) files like this:

filename: 00.csv

residue, vwd, total  
AAA,0.00, -9.45  
BBB, 0.45, -1.45  
CCC, 0.44, -3    
DDD, 0.1, -10 

filename: 01.csv

residue, vwd, total  
AAA, 2, -0.56  
BBB, -4, -9.32  
CCC, 2.54, -10  
DDD, 3, -6.4  

...

I would like to create a matrix in a new csv file where the first column is "residue", and the others are based on filename (without extension). Below filename, it should be values from "total" column. It would be like this:

residue, 00, 01, ...      
AAA, -9.45, -0,56, ...  
BBB, -1.45, -9.32, ...  
CCC, -3, -10,...  
DDD,  -10, -6.4, ... 

. . .

Thanks in advance!

Upvotes: 0

Views: 578

Answers (2)

user17242583
user17242583

Reputation:

This will work:

files = ['00.csv', '01.csv']

dfs = []
for file in files:
    df = pd.read_csv(file)
    df.columns = df.columns.str.strip()
    df = df[['residue', 'total']].rename({'total': os.path.splitext(file)[0]}, axis=1)
    dfs.append(df)

df = cols[0]
for sub_df in cols[1:]:
    df = df.merge(sub_df, on='residue')

Output:

>>> df
  residue     00     01
0     AAA  -9.45  -0.56
1     BBB  -1.45  -9.32
2     CCC  -3.00 -10.00
3     DDD -10.00  -6.40

Upvotes: 1

Yefet
Yefet

Reputation: 2086

Lets try read the files then pivot table

first get all csv files in that dir

import os

#current directory
path = "."
file_names = list(filter(lambda x:x.endswith(".csv"),os.listdir(path)))

print(file_names)
['filename: 00.csv', 'filename: 01.csv']
df = pd.DataFrame()

for file in file_names:
   df = df.append(pd.concat([pd.read_csv(file),pd.DataFrame([file]*4,
   columns= ["filename"])],axis=1))

# to extract digit from filename
df["filename"] =df["filename"].str.extract('(\d+)')

print(df)
  residue vwd  total  filename
0 AAA    0.00  -9.45  00
1 BBB    0.45  -1.45  00
2 CCC    0.44  -3.00  00
3 DDD    0.10  -10.00 00
0 AAA    2.00  -0.56  01
1 BBB   -4.00  -9.32  01
2 CCC    2.54  -10.00 01
3 DDD    3.00  -6.40  01

Then pivot_table will do the job

df.pivot_table(values='total', index='residue', columns='filename')

Final output:

filename  00   01
residue       
AAA          -9.45 -0.56
BBB          -1.45 -9.32
CCC          -3.00 -10.00
DDD          -10.00 -6.40

Upvotes: 0

Related Questions