Python; write dataframe output to different subdirectories

Question

I am running my script from my current working directory. With my script, I loop over the subdirectories of the current working directory. Each subdirectory contains the 3 files mentioned in the script, and for each subdirectory, I merge the 3 files to one dataframe. As my script is now, it writes the merged dataframe of only one subdirectory to the current working directory. What I want is the csv file with the merged dataframe of each subdirectory saved in that subdirectory, or a file with dataframes of each subdirectory concatenated to one large output file. With my script, I have only the output of one subdirectory in the output file.

My script is as follows:

print('Start merging contig files')

for root, dirs, files in os.walk(os.getcwd()):
    filepath = os.path.join(root, 'genes.faa.genespercontig.csv')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f1:
            df1 = pd.read_csv(f1, header=None, delim_whitespace=True, names = ["contig", "genes"])
            df1['genome'] = os.path.basename(os.path.dirname(filepath))

    filepath = os.path.join(root, 'hmmer.analyze.txt.results.txt')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f2:
            df2 = pd.read_csv(f2, header=None, delim_whitespace=True, names = ["contig", "SCM"])
            df2['genome'] = os.path.basename(os.path.dirname(filepath))

    filepath = os.path.join(root, 'genes.fna.output_blastplasmiddb.out.count_plasmiddbhit.out')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f3:
            df3 = pd.read_csv(f3, header=None, delim_whitespace=True, names = ["contig", "plasmid_genes"])
            df3['genome'] = os.path.basename(os.path.dirname(filepath))

#merge dataframes
dfmerge1 = pd.merge(df1, df2, on=['genome', 'contig'], how='outer')
df_end = pd.merge(dfmerge1, df3, on=['genome', 'contig'], how='outer')      

df_end.to_csv('outputgenesdf.csv')

MaxU - stand with Ukraine · Accepted Answer

Try this:

df_end.to_csv(os.path.join(root, 'outputgenesdf.csv'))

PS make sure that this command is in the for loop

print('Start merging contig files')

for root, dirs, files in os.walk(os.getcwd()):
    filepath = os.path.join(root, 'genes.faa.genespercontig.csv')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f1:
            df1 = pd.read_csv(f1, header=None, delim_whitespace=True, names = ["contig", "genes"])
            df1['genome'] = os.path.basename(os.path.dirname(filepath))

    filepath = os.path.join(root, 'hmmer.analyze.txt.results.txt')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f2:
            df2 = pd.read_csv(f2, header=None, delim_whitespace=True, names = ["contig", "SCM"])
            df2['genome'] = os.path.basename(os.path.dirname(filepath))

    filepath = os.path.join(root, 'genes.fna.output_blastplasmiddb.out.count_plasmiddbhit.out')
    if os.path.isfile(filepath):
        with open(filepath, 'r') as f3:
            df3 = pd.read_csv(f3, header=None, delim_whitespace=True, names = ["contig", "plasmid_genes"])
            df3['genome'] = os.path.basename(os.path.dirname(filepath))

    #merge dataframes
    dfmerge1 = pd.merge(df1, df2, on=['genome', 'contig'], how='outer')
    df_end = pd.merge(dfmerge1, df3, on=['genome', 'contig'], how='outer')      

    df_end.to_csv(os.path.join(root, 'outputgenesdf.csv'))

Python; write dataframe output to different subdirectories

Answers (2)

Related Questions