florence-y
florence-y

Reputation: 871

Read multiple csv files with Pandas and assign different names

I am inside a directory with a series of .csv files that I would like to assign to their own variable.

The idea is that I want to tidy up each dataframe on its own first within a loop, then concantenate everything at the end (my code non-"loopified" is a series of dropping, renaming, and group-by/pivot commands. I wrote these commands out as all .csv files look the same.

The last step to writing my loop is to iteratively read the set of .csv files in a for loop. The csv files are named:

  1. 100001_t0.csv
  2. 100001_t1.csv
  3. 100001_t2.csv
  4. 100002_t0.csv

... and so on until 100009_t2.csv

In my below loop, filename is the filename of the csv while subjid is the alphanumeric ID before the .csv extension.

I have tried exec("{0}_df = pd.read_csv(filename)".format(subjid)), but get an invalid token error. Is there a way I can change my format portion of this line to get the dataframes assigned to their own variable named by their subjid?

Thanks!

for filename in os.listdir(volume_statistics_directory):
    f = os.path.join(volume_statistics_directory, filename)
    if os.path.isfile(f):
        subjid = filename[0:9]
        #print(subjid)
        #print(f)
        print(filename, "being read in...")
        print("\n")
        exec("{0}_df = pd.read_csv(filename)".format(subjid))
        df = pd.read_csv(filename)


100001_t0.csv being read in...


Traceback (most recent call last):

  File "C:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-109-ceed2fd80975>", line 9, in <module>
    exec("{0}_df = pd.read_csv(filename)".format(subjid))

  File "<string>", line 1
    100001_t0_df = pd.read_csv(filename)
          ^
SyntaxError: invalid token

Upvotes: 0

Views: 1840

Answers (1)

shadowtalker
shadowtalker

Reputation: 13913

The error here happens because it's not legal for a variable name to start with a number. Your code would have worked otherwise.

However, constructing variable names from strings is usually a bad idea. Use a dict instead:

dfs = {}
for f in files:
    dfs[f] = pd.read_csv(f)

Upvotes: 4

Related Questions