Reputation: 1250
I created a DataFrame neighbours
using sim_measure_i
which is also a DataFrame.
neighbours= sim_measure_i.apply(lambda s: s.nlargest(k).index.tolist(), axis =1)
neighbours
looks like this:
1500 [0, 1, 2, 3, 4]
1501 [0, 1, 2, 3, 4]
1502 [0, 1, 2, 3, 4]
1503 [7230, 12951, 13783, 8000, 18077]
1504 [1, 3, 6, 27, 47]
The second column here has lists - I want to iterate over this DataFrame and work on the list such that I can read each element in the list - say 7230 and lookup a score for 7230 in another DataFrameI have which contains (id, score).
I would then like to add a column to this DataFrame such that it looks like
test_case_id nbr_list scores
1500 [0, 1, 2, 3, 4] [+1, -1, -1, +1, -1]
1501 [0, 1, 2, 3, 4] [+1, +1, +1, -1, -1]
1502 [0, 1, 2, 3, 4] [+1, +1, +1, -1, -1]
1503 [7230, 12951, 13783, 8000, 18077] [+1, +1, +1, -1, -1]
1504 [1, 3, 6, 27, 47] [+1, +1, +1, -1, -1]
Edit: I've written a method get_scores()
def get_scores(list_of_neighbours):
score_matrix = []
for x, val in enumerate(list_of_neighbours):
score_matrix.append(df.iloc[val].score)
return score_matrix
When I try to use lambda
on each of nbr_list
, I get this error:
TypeError: ("cannot do positional indexing on <class 'pandas.indexes.numeric.Int64Index'> with these indexers [0] of <type 'str'>", u'occurred at index 1500')
The code causing this issue:
def nearest_neighbours(similarity_matrix, k):
neighbours = pd.DataFrame(similarity_matrix.apply(lambda s: s.nlargest(k).index.tolist(), axis =1))
neighbours = neighbours.rename(columns={0 : 'nbr_list'})
nbr_scores = neighbours.apply(lambda l: get_scores(l.nbr_list), axis=1)
print neighbours
Upvotes: 1
Views: 1109
Reputation: 76297
Say you start with neighbors
looking like this.
In [87]: neighbors = pd.DataFrame({'neighbors_list': [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]})
In [88]: neighbors
Out[88]:
neighbors_list
0 [0, 1, 2, 3, 4]
1 [0, 1, 2, 3, 4]
You didn't specify exactly how the other DataFrame (containing the id-score pairs looks), so here is an approximation.
In [89]: id_score = pd.DataFrame({'id': [0, 1, 2, 3, 4], 'score': [1, -1, -1, 1, -1]})
In [90]: id_score
Out[90]:
id score
0 0 1
1 1 -1
2 2 -1
3 3 1
4 4 -1
You can convert this to a dictionary:
In [91]: d = id_score.set_index('id')['score'].to_dict()
And then apply
:
In [92]: neighbors.neighbors_list.apply(lambda l: [d[e] for e in l])
Out[92]:
0 [1, -1, -1, 1, -1]
1 [1, -1, -1, 1, -1]
Name: neighbors_list, dtype: object
Upvotes: 1
Reputation: 738
You can try a nested loop:
for i in range(neighbours.shape[0]): #iterate over each row
for j in range(len(neighbours['neighbours_lists'].iloc[i])): #iterate over each element of the list
a = neighbours['neighbours_lists'].iloc[i][j] #access the element of the list index j in cell location of row i
where i
is the outer loop variable which iterates over each row and j
is the inner loop variable which iterates over the length of the list inside each cell.
Upvotes: 1
Reputation: 16619
Original Data Frame:
In [68]: df
Out[68]:
test_case_id neighbours_lists
0 1500 [0, 1, 2, 3, 4]
1 1501 [0, 1, 2, 3, 4]
2 1502 [0, 1, 2, 3, 4]
3 1503 [7230, 12951, 13783, 8000, 18077]
4 1504 [1, 3, 6, 27, 47]
Custom function which takes id and list and does some computation to evaluate score:
In [69]: def g(_id, nbs):
...: return ['-1' if (_id + 1) % (nb + 1) else '+1' for nb in nbs]
...:
Apply custom function to all rows of original data frame:
In [70]: scores = df.apply(lambda x: g(x.test_case_id, x.neighbours_lists), axis=1)
Convert the scores series to a data frame and concat it with the original data frame:
In [71]: df = pd.concat([df, scores.to_frame(name='scores')], 1)
In [72]: df
Out[72]:
test_case_id neighbours_lists scores
0 1500 [0, 1, 2, 3, 4] [+1, -1, -1, -1, -1]
1 1501 [0, 1, 2, 3, 4] [+1, +1, -1, -1, -1]
2 1502 [0, 1, 2, 3, 4] [+1, -1, +1, -1, -1]
3 1503 [7230, 12951, 13783, 8000, 18077] [-1, -1, -1, -1, -1]
4 1504 [1, 3, 6, 27, 47] [-1, -1, +1, -1, -1]
Upvotes: 1