Reputation: 169
I have a list and panda data frame and I want to use for loop, if loop and zip on them to get a single value from the data frame based on the corresponding value in v.
list v
v=[3,2,1,0,4,0,0,1,2,4]
pandas data frame df
1st 2nd 3rd 4th
b1 0.498717 0.264786 0.00992303 0.000516895
b2 0.427093 0.0990702 0.00107178 2.75326e-05
b3 0.276645 0.0322039 0.000112341 1.60488e-06
b4 0.14827 0.00928838 1.09752e-05 9.2808e-08
b5 0.0975582 0.00440099 2.86551e-06 1.83807e-08
b6 0.0302828 0.0006493 1.04099e-07 3.58615e-10
b7 0.0211258 0.000372098 4.07256e-08 1.19155e-10
b8 0.00833787 9.24801e-05 4.0522e-09 8.08719e-12
b9 0.028685 0.000596652 9.02113e-08 3.03026e-10
b10 0.000693003 2.7417e-06 1.4319e-11 1.22682e-14
I tried this way but it returns an empty data frame
Empty DataFrame
Columns: []
Index: []
n=[] #or pd.DataFrame()
for ns in range(0, len(v)):
for i,row in list(zip(v,df)):# df.row,.iterrows(),.index
print(row)
if i ==1:
n.append(row.iloc[ns]['1st'])
elif i==2:
n.append(row.iloc[ns]['2nd'])
elif i==3:
n.append(row.iloc[ns]['3rd'])
elif i == 4:
n.append(row.iloc[ns]['4th'])
else:
n.append(0)
vs=n
print(vs)
The output i am looking for
vs=[0.00992303,0.0990702 ,0.276645,0,.......] # or pd.Dataframe
Upvotes: 0
Views: 459
Reputation: 863166
First dont loop in pandas DataFrame rows, if exist some another vectorized solutions:
You can use numpy indexing, but because non column with 0
for 0
values first add it to 2d array with np.hstack
:
arr = np.hstack((np.zeros((len(df), 1)), df.to_numpy()))
print (arr)
[[0.00000e+00 4.98717e-01 2.64786e-01 9.92303e-03 5.16895e-04]
[0.00000e+00 4.27093e-01 9.90702e-02 1.07178e-03 2.75326e-05]
[0.00000e+00 2.76645e-01 3.22039e-02 1.12341e-04 1.60488e-06]
[0.00000e+00 1.48270e-01 9.28838e-03 1.09752e-05 9.28080e-08]
[0.00000e+00 9.75582e-02 4.40099e-03 2.86551e-06 1.83807e-08]
[0.00000e+00 3.02828e-02 6.49300e-04 1.04099e-07 3.58615e-10]
[0.00000e+00 2.11258e-02 3.72098e-04 4.07256e-08 1.19155e-10]
[0.00000e+00 8.33787e-03 9.24801e-05 4.05220e-09 8.08719e-12]
[0.00000e+00 2.86850e-02 5.96652e-04 9.02113e-08 3.03026e-10]
[0.00000e+00 6.93003e-04 2.74170e-06 1.43190e-11 1.22682e-14]]
out = arr[np.arange(len(df)), v].tolist()
print (out)
[0.00992303, 0.0990702, 0.276645, 0.0, 1.83807e-08, 0.0, 0.0,
0.00833787, 0.0005966519999999999, 1.22682e-14]
Another idea is add only zeros first column by DataFrame.insert
, change columns names by range and then use DataFrame.lookup
:
df.insert(0,'zero',0)
df.columns = range(len(df.columns))
print (df)
0 1 2 3 4
b1 0 0.498717 0.264786 9.923030e-03 5.168950e-04
b2 0 0.427093 0.099070 1.071780e-03 2.753260e-05
b3 0 0.276645 0.032204 1.123410e-04 1.604880e-06
b4 0 0.148270 0.009288 1.097520e-05 9.280800e-08
b5 0 0.097558 0.004401 2.865510e-06 1.838070e-08
b6 0 0.030283 0.000649 1.040990e-07 3.586150e-10
b7 0 0.021126 0.000372 4.072560e-08 1.191550e-10
b8 0 0.008338 0.000092 4.052200e-09 8.087190e-12
b9 0 0.028685 0.000597 9.021130e-08 3.030260e-10
b10 0 0.000693 0.000003 1.431900e-11 1.226820e-14
out = df.lookup(df.index, v).tolist()
print (out)
[0.00992303, 0.0990702, 0.276645, 0.0, 1.83807e-08, 0.0, 0.0,
0.00833787, 0.0005966519999999999, 1.22682e-14]
Similar idea, only output is in new variable df1
if necessary not change original DataFrame
:
df1 = (df.set_index(np.zeros(len(df)))
.reset_index()
.set_axis(np.arange(len(df.columns) + 1), inplace=False, axis=1))
print (df1)
0 1 2 3 4
0 0.0 0.498717 0.264786 9.923030e-03 5.168950e-04
1 0.0 0.427093 0.099070 1.071780e-03 2.753260e-05
2 0.0 0.276645 0.032204 1.123410e-04 1.604880e-06
3 0.0 0.148270 0.009288 1.097520e-05 9.280800e-08
4 0.0 0.097558 0.004401 2.865510e-06 1.838070e-08
5 0.0 0.030283 0.000649 1.040990e-07 3.586150e-10
6 0.0 0.021126 0.000372 4.072560e-08 1.191550e-10
7 0.0 0.008338 0.000092 4.052200e-09 8.087190e-12
8 0.0 0.028685 0.000597 9.021130e-08 3.030260e-10
9 0.0 0.000693 0.000003 1.431900e-11 1.226820e-14
out = df1.lookup(df1.index, v).tolist()
print (out)
[0.00992303, 0.0990702, 0.276645, 0.0, 1.83807e-08, 0.0, 0.0,
0.00833787, 0.0005966519999999999, 1.22682e-14]
Upvotes: 2