Reputation: 21
I had several csv files with same structure and I want
1.csv -> data1, 2.csv ->data2
delta1 = data1["C"] - data1["A"]
I want put it into the for loop, looks like this:
for i in range (1, 22):
data%i = pd.read_csv('CSV/' + str(i) + '.csv')
delta%i = data%i["C"] - data%i["A"]
# And I want to compare the 2 series from dataframe.column to find a set intersection
set(data1[data1.delta1 > 0].column) & set(data2[data2.delta2 == 0].column)
set(data2[data2.delta2 > 0].column) & set(data3[data3.delta3 == 0].column)
but certainly wrong syntax in the for loop, is there better way to code it so that after the loop, I can get:
data1, data2, data3
...
with corresponding:
delta1, delta2, delta3
...
Upvotes: 2
Views: 957
Reputation: 32244
You can do everything with native pandas functions as opposed to dicts.
First read your csvs into a list:
df_list = []
for i in range(1, 22):
df_list.append(pd.read_csv("{i}.csv".format(i)))
Now concat them:
df = pd.concat(df_list, keys=range(1,22))
Now your dataframe df
is indexed with the key of the file you loaded.
Doing for example df.loc[1]
will get you the data from the file 1.csv
You can now set your deltas witha a single operation:
df["delta"] = df["C"] - df["A"]
And you can access these deltas with the DataFrame.loc
operation too, like this:
df.loc[2,"delta"]
This method is more native to pandas
and may scale better with larger datasets.
Upvotes: 1
Reputation: 18743
Here is an implementation of what you want using dictionaries (as also suggested by @EdChum in the comments for example):
data = {}
delta = {}
for i in range (1, 22):
data[i]= pd.read_csv('CSV/' + str(i) + '.csv')
delta[i] = data[i]["C"] - data[i]["A"]
# And I want to compare the 2 series from dataframe.column to find a set intersection
set(data[1][data[1].delta[1] > 0].column) & set(data[2][data[2].delta[2] == 0].column)
set(data[2][data[2].delta[2] > 0].column) & set(data[3][data[3].delta[3] == 0].column)
I would really recommend using a dictionnary such as above. However if you really really really insist on allocating variables dynamically like you want to do in your questions, you can do the following highly dangerous and not recommended thing:
You can allocate variables dynamically using:
globals()[variable_name]=variable_value
Again you really shouldn't do that. There is no reason to do it either, but here you go, here is the modification of your code which does exactly what you wanted:
for i in range (1, 22):
datai = "data"+str(i)
deltai = "delta"+str(i)
globals()[datai] = pd.read_csv('CSV/' + str(i) + '.csv')
globals()[deltai] = globals()[datai]["C"] - globals()[datai]["A"]
# And I want to compare the 2 series from dataframe.column to find a set intersection
set(data1[data1.delta1 > 0].column) & set(data2[data2.delta2 == 0].column)
set(data2[data2.delta2 > 0].column) & set(data3[data3.delta3 == 0].column)
Upvotes: 0