Constantine1991
Constantine1991

Reputation: 111

Modifying dataframe column based on another column values

I have a dataframe with two columns and want to modify one column based on value of other column.

Example

unit        name
feet        abcd_feet
celcius     abcd_celcius
yard        bcde_yard
yard        bcde

If the unit is feet or yard and the name ends with it then I wanna remove it from the column.

unit        name
feet        abcd
celcius     abcd_celcius
yard        bcde
yard        bcde

Upvotes: 2

Views: 77

Answers (1)

RobinFrcd
RobinFrcd

Reputation: 5426

There are two possible ways of solving your problem:

First method, the faster, as pandas is column-based:

UNITS_TO_REMOVE = {'feet', 'yard'}

df['value_'], df['unit_'] = df['name'].str.split('_').str
values_to_clean = (df['unit_'].isin(UNITS_TO_REMOVE)) & (df['unit_'] == df['unit'])
df.loc[values_to_clean, 'name'] = df.loc[values_to_clean, 'value_']
df.drop(columns=['unit_', 'value_'], inplace=True)

Here is the result,

    unit    name
0   feet    abcd
1   celcius abcd_celcius
2   yard    bcde
3   yard    bcde

Performances: 20 ms ± 401 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) (on a (4000, 2) dataframe)


Second method, using apply (which is sometimes the only available solution):

UNITS_TO_REMOVE = {'feet', 'yard'}

def remove_unit(unit, value):
    if unit not in UNITS_TO_REMOVE or '_' not in value:
        return value
    else:
        row_value, row_unit = value.split('_')
        if row_unit == unit:
            return row_value
        else:
            return value

df['name'] = df.apply(lambda row: remove_unit(row['unit'], row['name']), axis=1)

Output:


    unit    name
0   feet    abcd
1   celcius abcd_celcius
2   yard    bcde
3   yard    bcde

Performances: 152 ms ± 3.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 1

Related Questions