Reputation: 41
I have a dataset that looks like below:
Zn Pb Ag Cu Mo Cr Ni Co Ba
87 7 0.02 42 2 57 38 14 393
70 6 0.02 56 2 27 29 20 404
75 5 0.02 69 2 44 23 17 417
70 6 0.02 54 1 20 19 12 377
I want to create a pandas dataframe out of this dataset. I have written the function below:
def correlation_iterated(raw_data,element_concentration):
columns = element_concentration.split()
df1 = pd.DataFrame(columns=columns)
data1=[]
selected_columns = raw_data.loc[:, element_concentration.split()].columns
for i in selected_columns:
for j in selected_columns:
# another function that takes 'i' and 'j' and returns 'a'
zipped1 = zip([i], a)
data1.append(dict(zipped1))
df1 = df1.append(data1,True)
print(df1)
This function is supposed to do the calculations for each element and create a 9 by 9 pandas dataframe and store each calculation in each cell. But I get the following:
Zn Pb Ag Cu Mo Cr Ni Co Ba
0 1.000000 NaN NaN NaN NaN NaN NaN NaN NaN
1 0.460611 NaN NaN NaN NaN NaN NaN NaN NaN
2 0.127904 NaN NaN NaN NaN NaN NaN NaN NaN
3 0.276086 NaN NaN NaN NaN NaN NaN NaN NaN
4 -0.164873 NaN NaN NaN NaN NaN NaN NaN NaN
.. ... .. .. .. .. .. .. .. ...
76 NaN NaN NaN NaN NaN NaN NaN NaN 0.113172
77 NaN NaN NaN NaN NaN NaN NaN NaN 0.027251
78 NaN NaN NaN NaN NaN NaN NaN NaN -0.036409
79 NaN NaN NaN NaN NaN NaN NaN NaN 0.041396
80 NaN NaN NaN NaN NaN NaN NaN NaN 1.000000
[81 rows x 9 columns]
which is basically calculating the results of the first column and storing them in just the first column, then doing the calculations and appending new rows to the column. How can I program the code in a way that appends new calculations to the next column when finished with one column? I want sth like this:
Zn Pb Ag Cu Mo Cr Ni Co Ba
0 1.000000 0.460611 ...
1 0.460611 1.000000 ...
2 0.127904 0.111559 ...
3 0.276086 0.303925 ...
4 -0.164873 -0.190886 ...
5 0.402046 0.338073 ...
6 0.174774 0.096724 ...
7 0.165760 -0.005301 ...
8 -0.043695 0.174193 ...
[9 rows x 9 columns]
Upvotes: 0
Views: 64
Reputation: 173
Could you not just do something like this:
def correlation_iterated(raw_data,element_concentration):
columns = element_concentration.split()
data = {}
selected_columns = raw_data.loc[:,columns].columns
for i in selected_columns:
temp = []
for j in selected_columns:
# another function that takes 'i' and 'j' and returns 'a'
temp.append(a)
data[i] = temp
df = pd.DataFrame(data)
print(df)
Upvotes: 2