user270199
user270199

Reputation: 1417

Reoder Series rows based on index of a value

I have a large Pandas Series, so optimisation is key

pd.Series(['I like apples', 'They went skiing vacation', 'Apples are tasty', 'The skiing was great'], dtype='string')

0                I like apples
1    They went skiing vacation
2             Apples are tasty
3         The skiing was great
dtype: string

Consider that the rows are lists of strings, i.e. row 0 is ['I', 'like', 'apples'].

I would like to get the index of say 'apples' and reorder the rows based on the value of this index. In this example, the Series would look like:

2             Apples are tasty
0                I like apples
1    They went skiing vacation
3         The skiing was great
dtype: string

because the index of 'apples' (ignoring case-sensitivity) was 0 in row 2.

Upvotes: 0

Views: 29

Answers (1)

jezrael
jezrael

Reputation: 862671

Use Series.str.contains

#create DataFrame by split and reshape
s1 = s.str.split(expand=True).stack()
#filter only matched apple rows, sorting by second level (possition of apples)
idx  = s1[s1.str.contains('apples', case=False)].sort_index(level=1).index

#get original index by uion and select by loc for change ordering
s = s.loc[idx.remove_unused_levels().levels[0].union(s.index, sort=False)]
print (s)
2             Apples are tasty
0                I like apples
1    They went skiing vacation
3         The skiing was great
dtype: string

Another idea with list comprehension and enumerate:

a = [next(iter(i for i, j in enumerate(x.split()) if j.lower() == 'apples'), len(s)*10) for x in s]
print (a)
[2, 40, 0, 40]

s = s.loc[np.array(a).argsort()]
print (s)
2             Apples are tasty
0                I like apples
1    They went skiing vacation
3         The skiing was great
dtype: string

Upvotes: 1

Related Questions