Reputation: 29
I have one main dataframe that contains data for ~25 people with 10 trials per person. I also have individual files for each trial by participant. My goal is to have one file per participant that contains all 10 trials with data from both the main dataframe and the individual files.
I am matching the data in the main dataframe and the files in the directory by filename (the filename contains both the participant ID and the trial number- ex: 90-9.csv).
Example of the data:
# Main df:
ID trial file length
90 9 90-9.csv 56
90 10 90-10.csv 44
91 1 91-1.csv 62
91 2 91-2.csv 48
# Individual files in directory- these files contain diameter data:
90-9.csv
90-10.csv
91-1.csv
91-2.csv
# intended output:
ID trial file length diameter
.. .. .. .. ..
90 8 90-8.csv 62 3.15
90 9 90-9.csv 56 3.17
90 10 90-10.csv 44 3.14
I have tried the following for looping through the directory:
directory = os.chdir(r'filepath')
# create list of files
dir_list = os.listdir(directory)
for file in dir_list:
df = pd.read_csv('mainDF.csv')
# create filename column in main df
df['ID'] = df['ID'].astype(str)
df['trial'] = df['trial'].astype(str)
df['file'] = df['ID']+'-'+df['trial']+'.csv'
# this doesn't work
for file in zoom['filename']:
pupil = pd.read_csv(file)
# this one doesn't organize the data properly
if ([x in file for x in zoom['filename']]):
pupil = pd.read_csv(file)
When I isolate one participant and one trial number, the data is organized the way I show in the intended output. When I loop through the directory, everything becomes out of order. I'm not sure what's going on.
Upvotes: 0
Views: 91
Reputation: 786
os.listdir
will list the files (and folders) in the directory given to it. However, if you want to then access that file, you need to give the full path. So directory+"/"+file
. In addition, you're reloading the mainDF.csv
dataframe on each iteration. I don't think you actually want to do that, you probably just want to do that once before you start your for loop. You also seem to be trying to iterate on your files twice. I have no idea what your zoom
variable even is, let alone what you hope to achieve by looping over it.
What I would advise is that you rewrite this code using glob
instead of listdir
. That should be safer by only selecting the specific files you want and won't require any file manipulation on your part. So something like this :
import glob
import pandas as pd
df = pd.read_csv('mainDF.csv')
df['ID'] = df['ID'].astype(str)
df['trial'] = df['trial'].astype(str)
df['file'] = df['ID']+'-'+df['trial']+'.csv'
for path in glob.glob("path/to/files/*.csv"):
pupil_df = pd.read_csv(path)
# do what you want with pupil_df here
Upvotes: 1