CC.Shih
CC.Shih

Reputation: 21

Reading multiple CSV files to DataFrames and naming them after their original file name

I had several csv files with same structure and I want

  1. Assign file to dataframe name with the same order : 1.csv -> data1, 2.csv ->data2
  2. And assign columns in the same manner : delta1 = data1["C"] - data1["A"]

I want put it into the for loop, looks like this:

for i in range (1, 22):
    data%i = pd.read_csv('CSV/' + str(i) + '.csv')
    delta%i = data%i["C"] - data%i["A"]

# And I want to compare the 2 series from dataframe.column to find a set intersection
set(data1[data1.delta1 > 0].column) & set(data2[data2.delta2 == 0].column) 
set(data2[data2.delta2 > 0].column) & set(data3[data3.delta3 == 0].column) 

but certainly wrong syntax in the for loop, is there better way to code it so that after the loop, I can get:

data1, data2, data3 ...

with corresponding:

delta1, delta2, delta3 ...

Upvotes: 2

Views: 957

Answers (2)

firelynx
firelynx

Reputation: 32244

You can do everything with native pandas functions as opposed to dicts.

First read your csvs into a list:

df_list = []
for i in range(1, 22):
    df_list.append(pd.read_csv("{i}.csv".format(i)))

Now concat them:

df = pd.concat(df_list, keys=range(1,22))

Now your dataframe df is indexed with the key of the file you loaded.

Doing for example df.loc[1] will get you the data from the file 1.csv

You can now set your deltas witha a single operation:

df["delta"] = df["C"] - df["A"]

And you can access these deltas with the DataFrame.loc operation too, like this:

df.loc[2,"delta"]

This method is more native to pandas and may scale better with larger datasets.

Upvotes: 1

patapouf_ai
patapouf_ai

Reputation: 18743

Here is an implementation of what you want using dictionaries (as also suggested by @EdChum in the comments for example):

data = {}
delta = {}

for i in range (1, 22):
    data[i]= pd.read_csv('CSV/' + str(i) + '.csv')
    delta[i] = data[i]["C"] - data[i]["A"]

# And I want to compare the 2 series from dataframe.column to find a set intersection
set(data[1][data[1].delta[1] > 0].column) & set(data[2][data[2].delta[2] == 0].column) 
set(data[2][data[2].delta[2] > 0].column) & set(data[3][data[3].delta[3] == 0].column) 

I would really recommend using a dictionnary such as above. However if you really really really insist on allocating variables dynamically like you want to do in your questions, you can do the following highly dangerous and not recommended thing:

You can allocate variables dynamically using:

globals()[variable_name]=variable_value

Again you really shouldn't do that. There is no reason to do it either, but here you go, here is the modification of your code which does exactly what you wanted:

for i in range (1, 22):
    datai = "data"+str(i)
    deltai = "delta"+str(i)
    globals()[datai] = pd.read_csv('CSV/' + str(i) + '.csv')
    globals()[deltai] = globals()[datai]["C"] - globals()[datai]["A"]

# And I want to compare the 2 series from dataframe.column to find a set intersection
set(data1[data1.delta1 > 0].column) & set(data2[data2.delta2 == 0].column) 
set(data2[data2.delta2 > 0].column) & set(data3[data3.delta3 == 0].column)

Upvotes: 0

Related Questions