Reputation: 475

Loop, iloc and loc in Dataframe?

Hey guys I'm new to python, right now I'm starting to work with some libraries such as Pandas and Numpy. Recently, my teacher gives me this excercise, and I don't know which method I should use. Details are shown below:

df1 = pd.DataFrame({'col1': [0, 1, 1, 0, 3],
               'col2': [23, 4, 1, 1, 3], 
               'col3': [0, 5, 2, 1, 1],
               'col4': [1, 2, 6, 4, 0],
               'col5': [4, 15, 0, 2, 5],
               'loc': [1, 4, 2, 3, 2]})

1) col1 - col5: random number

2) loc: the location of the value.

3) Calculate 'val' which returns the value of each column, locations are given in 'loc'.

Example: In line 0, loc = 1, val = 23. In line 1 loc = 4, val = 15, etc.

The result should be like this:

df = pd.DataFrame({'col1': [0, 1, 1, 0, 3],
               'col2': [23, 4, 1, 1, 3], 
               'col3': [0, 5, 2, 1, 1],
               'col4': [1, 2, 6, 4, 0],
               'col5': [4, 15, 0, 2, 5],
               'loc': [1, 4, 2, 3, 2],
                'val': [23, 15, 2, 4, 1]})

I have tried somthing like iloc and loc to calculate 'val'. However when the dataframe becomes larger I could not use this method anymore. Are there any faster way to calculate 'val'? Do I need to use loop to calculate 'val'?

df1 = df['loc']
df.iloc[0,df1[0]]
df.iloc[1,df1[1]]
df.iloc[2,df1[2]]

PS: Sorry for my bad English, but I really don't know how to explain this excercise in English, I just try my best :(

Upvotes: 2

Answers (3)

Niels Henkens

Reputation: 2696

You can use a for-loop for this, where you increment a value to the range of the length of the column 'loc' (for example). With .iloc you can the select the correct row and value from the 'loc' column.

I'm not going to spill out the complete solution for you, but something along the lines of:

vals = [] # Create an empty list to hold the requested values
for i in range(len(df['loc'])): # Loop over the rows ('i')
    val = df.iloc[i, df['loc'][i]] # Get the requested value from row 'i'
    vals.append(val) # append value to list 'vals'
df['value'] = vals # Add list 'vals' as a new column to the DataFrame

edited to complete the answer...

Upvotes: 2

RockStar

Reputation: 1314

Loop over the series with index then using pd.iat[row int pos, column int pos] you will get exact value, from array of values you can create new series.

result = []
for index, row in df1['loc'].iteritems():
     result.append(df1.iat[index, row])
df1['val'] = result

Upvotes: 1

jezrael

Reputation: 863501

Use numpy indexing, especially if performance is important:

df1['value'] = df1.values[np.arange(len(df1)), df1['loc']]
print (df1)
   col1  col2  col3  col4  col5  loc  value
0     0    23     0     1     4    1     23
1     1     4     5     2    15    4     15
2     1     1     2     6     0    2      2
3     0     1     1     4     2    3      4
4     3     3     1     0     5    2      1

Performance:

#5000 rows
df1 = pd.concat([df1] * 1000, ignore_index=True)
In [73]: %timeit df1['value'] = df1.values[np.arange(len(df1)), df1['loc']]
266 µs ± 8.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [74]: %%timeit
    ...: result = []
    ...: for index, row in df1['loc'].iteritems():
    ...:      result.append(df1.iat[index, row])
    ...: df1['val'] = result
    ...: 
64 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [75]: %timeit df1['value'] = df1.apply(lambda x: x.iloc[x['loc']], axis = 1)
243 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 3

Loop, iloc and loc in Dataframe?

Answers (3)

Related Questions