Reputation: 773
I have many csv files in a directory with two column each
miRNA read_counts
miR1 10
miR1 5
miR2 2
miR2 3
miR3 100
I would like to sum read_counts if the miRNA id is the same.
Result:
miRNA read_counts
miR1 15
miR2 5
miR3 100
To do that I wrote a little script. However I don't know how to loop it through all my csv files so I don't have to copy paste file names and output each time. Any help will be very appreciated. Thanks for the help!
import pandas as pd
df = pd.read_csv("modified_LC1a_miRNA_expressed.csv")
df_new = df.groupby('miRNA')['read_count'].sum()
print(df_new)
df_new.to_csv('sum_LC1a_miRNA_expressed.csv')
Upvotes: 0
Views: 44
Reputation: 490
Not trying to steal the answer. I would have put this in a comment under @Asif Ali's answer if I had enough rep.
Assuming all input .csv files follow the format: "modified_{rest_of_the_file_name}.csv"
And you want the outputs to be: "sum_{same_rest_of_the_file_name}.csv"
import os
import glob
path = "./your/path"
files = glob.glob(os.path.join(path, "*.csv"))
for file in files:
df = pd.read_csv(file)
df_new = df.groupby('miRNA')['read_count'].sum()
print(df_new)
df_new.to_csv(file.split('modified')[:-1] + \
'sum' + \
'_'.join(file.split('modified')[-1:]))
Upvotes: 0
Reputation: 1432
Try looking into glob
module.
from glob import glob
import os
path = "./your/path"
files = glob(os.path.join(path, "*.csv"))
dataframes = []
for file in files:
df = pd.read_csv(file)
# rest you would want to append these to dataframes
dataframes.append(df)
Then, use pd.concat
to join the dataframes and perform the groupby
operation.
EDIT 1: Based on the request mentioned in the comment:
results = {}
for file in files:
df = pd.read_csv(file)
# perform operation
df_new = df.groupby('miRNA')['read_count'].sum()
results[file] = df_new
Upvotes: 1