Tonypz16
Tonypz16

Reputation: 47

Equivalent of Excel IFS/Search functions in Python Pandas with 1 value and multiple conditions

The following is an example of an Excel worksheet I am working with:

Product_Type Part_Number Description Size
Rod R-SS-015
Rod R-SS-030
Rod R-SS-045
Rod R-SS-060
Rod R-SS-075
Rod R-SS-090
Nut N-150 Stainless Steel 1" Nut
Nut N-151 Stainless Steel 2" Nut
Nut N-152 Stainless Steel 3" Nut
Washer W-101 Stainless Steel 1" Washer
Washer W-102 Stainless Steel 2" Washer
Washer W-103 Stainless Steel 3" Washer

What I am trying to do is get the size of the "Rod" from the Part_Number and the size of the "Nut/Washer" from the description. In Excel I would use a combination of IFS, Search, and Left/Mid/Right functions.

I am able to find the size of the product types separately but am having trouble connecting to the Product Type.

As in:

"If Product_Type is equal to Rod then search the Part_Number and output to Size"

or

"If Product_Type is equal to Nut or Washer then search the Description and output to Size"

For the Rod I am using the following code:

conditions = [
df.loc[df['Part_Number'].str.contains('-015'), 'Size'] == '1"',
df.loc[df['Part_Number'].str.contains('-030'), 'Size'] == '2"',
df.loc[df['Part_Number'].str.contains('-045'), 'Size'] == '3"',
df.loc[df['Part_Number'].str.contains('-060'), 'Size'] == '4"',
df.loc[df['Part_Number'].str.contains('-075'), 'Size'] == '5"',
df.loc[df['Part_Number'].str.contains('-090'), 'Size'] == '6"',
]
df['Size'] = np.select(df[Product_Type] == "Rod", conditions, '')

For Nut/Washer I am using the following code:

df_2 = df['Descritpion'].str.extract(r'(?:\s([\d+]?\.?[\d+])")', expand=False)
df['Size'] = np.where(df['Product_Type'] == "Nut" or df['Product_Type'] == "Washer", df_2, '')

For both of these codes I am getting the following error:

ValueError: list of cases must be same length as list of conditions

This is because there is only one value (Rod) and multiple conditions (the sizes) correct?

How can I combine both str.extract and str.contains?

Upvotes: 0

Views: 459

Answers (1)

Nick ODell
Nick ODell

Reputation: 25374

Here are three approaches which will produce the same result.

Approach #1: np.select()

This will work with a slight change to your np.select() example. The key thing to do is to make sure both your conditions and replacements are lists of the same length.

conditions = [
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-015')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-030')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-045')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-060')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-075')),
    (df['Product_Type'] == 'Rod') & (df['Part_Number'].str.contains('-090')),
    df['Product_Type'] == 'Nut',
    df['Product_Type'] == 'Washer'
]

replacements = [
    '1',
    '2',
    '3',
    '4',
    '5',
    '6',
    df['Description'].str.extract(r'\s(\d+)"', expand=False),
    df['Description'].str.extract(r'\s(\d+)"', expand=False),
]

df['Size'] = np.select(conditions, replacements, default=None).astype(float)

Approach #2: apply()

As a shorter and more flexible alternative, there's also .apply(..., axis=1) as a way to apply a function to each row.

Here I use if statements to decide which kind of product it is, then use a dictionary to convert the part number ending to the size you want.

def get_size(row):
    if row['Product_Type'] == 'Rod':
        rod_size = re.search(r'-(\d+)$', row['Part_Number']).group(1)
        rod_convert = {
            '015': 1,
            '030': 2,
            '045': 3,
            '060': 4,
            '075': 5,
            '090': 6,
        }
        return rod_convert[rod_size]
    elif row['Product_Type'] == 'Nut' or row['Product_Type'] == 'Washer':
        return re.search(r'(\d+)"', row['Description']).group(1)
    else:
        return None

df['Size'] = df.apply(get_size, axis=1).astype(float)

However, that flexibility comes at a speed penalty.

Approach #3: groupby()

This is similar to the previous approach, but where the previous solution runs a function on each row of the dataframe, this approach splits the dataframe up into three sections, and runs the function on each dataframe.

After splitting into three dataframes, I check which split this is (using df.name) and extract the size with a regex. In the case of Rods, I use map() to substitute the part number ending for the size you want.

def get_size(df):
    if df.name == 'Rod':
        size = df['Part_Number'].str.extract(r'-(\d+)$', expand=False)
        df['Size'] = size.map({
            '015': 1,
            '030': 2,
            '045': 3,
            '060': 4,
            '075': 5,
            '090': 6,
        })
        return df
    elif df.name == 'Nut' or df.name == 'Washer':
        df['Size'] = df['Description'].str.extract(r'(\d+)"', expand=False).astype(float)
        return df

df = df.groupby('Product_Type').apply(get_size)

Result

All three produce the same result, which is:

   Product_Type Part_Number                Description  Size
0           Rod    R-SS-015                        NaN   1.0
1           Rod    R-SS-030                        NaN   2.0
2           Rod    R-SS-045                        NaN   3.0
3           Rod    R-SS-060                        NaN   4.0
4           Rod    R-SS-075                        NaN   5.0
5           Rod    R-SS-090                        NaN   6.0
6           Nut       N-150     Stainless Steel 1" Nut   1.0
7           Nut       N-151     Stainless Steel 2" Nut   2.0
8           Nut       N-152     Stainless Steel 3" Nut   3.0
9        Washer       W-101  Stainless Steel 1" Washer   1.0
10       Washer       W-102  Stainless Steel 2" Washer   2.0
11       Washer       W-103  Stainless Steel 3" Washer   3.0

Upvotes: 1

Related Questions