Reputation: 1417
I have a large Pandas Series, so optimisation is key
pd.Series(['I like apples', 'They went skiing vacation', 'Apples are tasty', 'The skiing was great'], dtype='string')
0 I like apples
1 They went skiing vacation
2 Apples are tasty
3 The skiing was great
dtype: string
Consider that the rows are lists of strings, i.e. row 0 is ['I', 'like', 'apples'].
I would like to get the index of say 'apples' and reorder the rows based on the value of this index. In this example, the Series would look like:
2 Apples are tasty
0 I like apples
1 They went skiing vacation
3 The skiing was great
dtype: string
because the index of 'apples' (ignoring case-sensitivity) was 0 in row 2.
Upvotes: 0
Views: 29
Reputation: 862671
#create DataFrame by split and reshape
s1 = s.str.split(expand=True).stack()
#filter only matched apple rows, sorting by second level (possition of apples)
idx = s1[s1.str.contains('apples', case=False)].sort_index(level=1).index
#get original index by uion and select by loc for change ordering
s = s.loc[idx.remove_unused_levels().levels[0].union(s.index, sort=False)]
print (s)
2 Apples are tasty
0 I like apples
1 They went skiing vacation
3 The skiing was great
dtype: string
Another idea with list comprehension and enumerate:
a = [next(iter(i for i, j in enumerate(x.split()) if j.lower() == 'apples'), len(s)*10) for x in s]
print (a)
[2, 40, 0, 40]
s = s.loc[np.array(a).argsort()]
print (s)
2 Apples are tasty
0 I like apples
1 They went skiing vacation
3 The skiing was great
dtype: string
Upvotes: 1