asd
asd

Reputation: 1309

Replace values by order of list

keywords = ['Small', 'Medium', 'Large']
df.column
0   The Small, Large, Medium
1   The fast Medium, Small XS
2   He was a Medium, Large or Small

How could I tell pandas if a row contains a keyword:

  1. Replace the keywords so that the keywords appear in the order of the list
  2. If the keyword contains a suffix, "XS", include that with step 1

Expected Output:

0 The Small, Medium, Large 
1 The fast Small XS, Medium
2 He was a Small, Medium or Large

Upvotes: 2

Views: 551

Answers (1)

Nick
Nick

Reputation: 147166

One way to do this is to:

  1. Split the string into words which match the keywords (with or without the XS suffix), or other non-matching parts using re.findall
  2. Sort the words which match according to their index in the keywords list
  3. Rebuild the words list using the sorted keywords
  4. Join the string back together

You can do that with this function:

def sizesorter(s, keywords):
    words = re.findall(r'((?:\b(?:' + '|'.join(keywords) + r')\b)(?:\sXS)?|(?:[^\s]*(?:\s|$)))', s, re.I)
    sizes = iter(sorted([w for w in words if w.split(' ')[0] in keywords], key=lambda w:keywords.index(w.split(' ')[0])))
    words = [w if w.split(' ')[0] not in keywords else next(sizes) for w in words]
    return ''.join(words)

You can then apply that function to the column. For example:

import pandas as pd
import re

df = pd.DataFrame({ 'column' : ['The Small, Large, Medium',
'The fast Medium, Small XS',
'He was a Medium, Large or Small',
'small, Large a metre'
] })

def sizesorter(s, keywords):
    words = re.findall(r'((?:\b(?:' + '|'.join(keywords) + r')\b)(?:\sXS)?|(?:[^\s]*(?:\s|$)))', s, re.I)
    sizes = iter(sorted([w for w in words if w.split(' ')[0] in keywords], key=lambda w:keywords.index(w.split(' ')[0])))
    words = [w if w.split(' ')[0] not in keywords else next(sizes) for w in words]
    return ''.join(words)
    
df.column = df.column.apply(sizesorter, args=(['Small', 'Medium', 'Large'], ))

print(df)

Output:

                            column
0         The Small, Medium, Large
1        The fast Small XS, Medium
2  He was a Small, Medium or Large

Partial sorting of the list of words adapted from this answer.

Upvotes: 3

Related Questions