Reputation: 475
Hey guys I'm new to python, right now I'm starting to work with some libraries such as Pandas and Numpy. Recently, my teacher gives me this excercise, and I don't know which method I should use. Details are shown below:
df1 = pd.DataFrame({'col1': [0, 1, 1, 0, 3],
'col2': [23, 4, 1, 1, 3],
'col3': [0, 5, 2, 1, 1],
'col4': [1, 2, 6, 4, 0],
'col5': [4, 15, 0, 2, 5],
'loc': [1, 4, 2, 3, 2]})
1) col1 - col5: random number
2) loc: the location of the value.
3) Calculate 'val' which returns the value of each column, locations are given in 'loc'.
Example: In line 0, loc = 1, val = 23. In line 1 loc = 4, val = 15, etc.
The result should be like this:
df = pd.DataFrame({'col1': [0, 1, 1, 0, 3],
'col2': [23, 4, 1, 1, 3],
'col3': [0, 5, 2, 1, 1],
'col4': [1, 2, 6, 4, 0],
'col5': [4, 15, 0, 2, 5],
'loc': [1, 4, 2, 3, 2],
'val': [23, 15, 2, 4, 1]})
I have tried somthing like iloc and loc to calculate 'val'. However when the dataframe becomes larger I could not use this method anymore. Are there any faster way to calculate 'val'? Do I need to use loop to calculate 'val'?
df1 = df['loc']
df.iloc[0,df1[0]]
df.iloc[1,df1[1]]
df.iloc[2,df1[2]]
PS: Sorry for my bad English, but I really don't know how to explain this excercise in English, I just try my best :(
Upvotes: 2
Views: 26738
Reputation: 2696
You can use a for-loop for this, where you increment a value to the range of the length of the column 'loc' (for example). With .iloc
you can the select the correct row and value from the 'loc' column.
I'm not going to spill out the complete solution for you, but something along the lines of:
vals = [] # Create an empty list to hold the requested values
for i in range(len(df['loc'])): # Loop over the rows ('i')
val = df.iloc[i, df['loc'][i]] # Get the requested value from row 'i'
vals.append(val) # append value to list 'vals'
df['value'] = vals # Add list 'vals' as a new column to the DataFrame
edited to complete the answer...
Upvotes: 2
Reputation: 1314
Loop over the series with index then using pd.iat[row int pos, column int pos] you will get exact value, from array of values you can create new series.
result = []
for index, row in df1['loc'].iteritems():
result.append(df1.iat[index, row])
df1['val'] = result
Upvotes: 1
Reputation: 863501
Use numpy indexing, especially if performance is important:
df1['value'] = df1.values[np.arange(len(df1)), df1['loc']]
print (df1)
col1 col2 col3 col4 col5 loc value
0 0 23 0 1 4 1 23
1 1 4 5 2 15 4 15
2 1 1 2 6 0 2 2
3 0 1 1 4 2 3 4
4 3 3 1 0 5 2 1
Performance:
#5000 rows
df1 = pd.concat([df1] * 1000, ignore_index=True)
In [73]: %timeit df1['value'] = df1.values[np.arange(len(df1)), df1['loc']]
266 µs ± 8.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [74]: %%timeit
...: result = []
...: for index, row in df1['loc'].iteritems():
...: result.append(df1.iat[index, row])
...: df1['val'] = result
...:
64 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [75]: %timeit df1['value'] = df1.apply(lambda x: x.iloc[x['loc']], axis = 1)
243 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 3