Reputation: 11
So I'm trying to use Python to analyze images of cells taken in multiple wells of a 96 well plate. I was able to generate a csv of my data from said images from each field and group them by well. I wrote a simple script to combine all 4 CSVs (I imaged 4 fields/well) in each well and generate a histogram of the values of one column from the combined CSV, and it works!
However, it only works if I specify the path each time - so I'd have to run it 96 separate times and manually change the path.
#combine csvs in one well
import pandas as pd
import glob
import os
import matplotlib.pyplot as plt
# Get a list of all CSV files in any given directory
path = "/path/to/well/folder"
csv_files = glob.glob(os.path.join(path, "*.csv"))
# Create an empty list to store dataframes
df_list = []
# Read each CSV file and append it to the list
for file in csv_files:
df = pd.read_csv(file)
df_list.append(df)
# Concatenate all dataframes into one
combined_df = pd.concat(df_list, ignore_index=True)
# Save the combined dataframe to a new CSV file in the same place as
combined_df.to_csv('path/to/well/folder/combined.csv', index=False)
#export descriptive stats
fig=combined_df['Mean'].hist()
plt.savefig("path/to/well/folder/fig.png")
I tried to write a program that would iterate this over all the subfolders (named for each well of the plate) in the directory named "results", but it didn't work and I'm not sure why. Here's what I have so far. Any help would be greatly appreciated.
Crucially, the end goal is to have each of the 4 CSVs in each subfolder become on merged CSV in EACH subfolder, NOT one big merged CSV representing the data from ALL FOLDERS combined. This is why I couldn't find an example that matched what I was trying to do elsewhere online.
import pandas as pd
import glob
import os
import matplotlib.pyplot as plt
rootdir = "path/to/folder/of/well/folders"
subfolderlist = os.listdir(rootdir)
print(subfolderlist)
for i in subfolderlist:
if not i.startswith('.'):
print(os.listdir(os.path.join(rootdir,i)))
csv_files = glob.glob(i, "*.csv")
df_list = []
for file in csv_files:
df = pd.read_csv(file)
df_list.append(df)
combined_df = pd.concat(df_list, ignore_index=True)
combined_df.to_csv(i + '/combined.csv', index=False)
fig=combined_df['Mean'].hist()
plt.savefig(i + '/fig.png')
the line "if not i.startswith('.'): is to tell it to ignore the .DS_Store file that also lives in this larger folder, so i fixed that error. The error I get is:
csv_files = glob.glob(i, "*.csv")
TypeError: glob() takes 1 positional argument but 2 were given
But I only gave it one positional argument - "i" , which should represent each subdirectory within the main directory, right? I'm not sure why this, which worked perfectly when I used (path, "*.csv") to run the file in one folder, where path was assigned to one directory in particular, isn't working now when I'm trying to tell it to iterate through multiple directories.
Any help?
Upvotes: 1
Views: 55
Reputation: 11
For anyone looking for an update, I figured out how to do this in a very easy way using the os.walk function.
import os
import pandas as pd
import matplotlib.pyplot as plt
root_path = 'path/fo/folder/of/well/folders'
for root, sub, files in os.walk(root_path):
filenames = [os.path.join(root, filename) for filename in files
if filename.endswith('.csv')]
flist = []
plt.clf()
for filename in filenames:
print(os.path.join(root, filename))
df = pd.read_csv(filename)
flist.append(df)
df_out = pd.concat(flist)
df_out.loc['mean of cells'] = df_out.mean()
df_out.to_csv(os.path.join(root, 'combined.csv'))
fig = df_out['Mean'].hist(bins=[0, 100, 500, 1000, 2000, 3000, 4000])
plt.savefig(os.path.join(root, 'fig'))
As you all can probably see, I modified this to also output a histogram of the combined data for each well and to add a line at the bottom where all the info is averaged.
The plt.clf() line before looping through the directories "clears" the plot generated by matplotlib so that the next figure that is saved is a histogram of the data of the NEXT subfolder; beforehand it was just putting out histograms overlaid on top of each other for each subfolder; not ideal.
Thanks for your help, particularly in alerting me to the os.walk function!
Upvotes: 0
Reputation: 182
it is better to use pathlib
from pathlib import Path
folder_path = Path("path/to/folder/of/well/folders")
for folder in parent_folder.iterdir():
if folder.is_dir():
csv_files = list(folder.glob("*.csv"))
Upvotes: 0