exonyz
exonyz

Reputation: 61

Update values of column conditionally dependent on another column in pandas

I have a pandas dataframe as follows:

  name origin delta
0 foo  raw_x   3
1 foo  raw_y   3
2 bar  raw_z   4
3 bar  raw_z   4
4 foobar raw_a  1
5 foobar raw_b  1

In the rows where name = bar the origin values are same so I can drop duplicates. But in rows where name = foo, the origin values are different. I want to modify/update the values in the same dataframe as below,

foo -> foo_x where raw_x
foo -> foo_y where raw_y
foobar -> foobar_a where raw_a
foobar -> foobar_b where raw_b

Name check like if name == 'foo' isn't possible, so we'll have to go by values that are the same in name column. How can this be done?

Upvotes: 1

Views: 30

Answers (1)

Nk03
Nk03

Reputation: 14949

IIUC, you can try:

df['name'] = (
    df.groupby('name', sort=False)
    .apply(
        lambda x: x['name'] + '_' + x['origin'].str.rsplit('_', 1).str[1]
        if x['origin'].nunique() > 1
        else
        x['name']
    ).values
)

Complete example:

df = pd.DataFrame({'name': {0: 'foo', 1: 'foo', 2: 'bar', 3: 'bar', 4: 'foobar', 5: 'foobar'},
                   'origin': {0: 'raw_x',
                              1: 'raw_y',
                              2: 'raw_z',
                              3: 'raw_z',
                              4: 'raw_a',
                              5: 'raw_b'},
                   'delta': {0: 3, 1: 3, 2: 4, 3: 4, 4: 1, 5: 1}})

df['name'] = (
    df.groupby('name', sort=False)
    .apply(
        lambda x: x['name'] + '_' + x['origin'].str.rsplit('_', 1).str[1]
        if x['origin'].nunique() > 1
        else
        x['name']
    ).values
)
OUTPUT:
       name origin  delta
0     foo_x  raw_x      3
1     foo_y  raw_y      3
2       bar  raw_z      4
3       bar  raw_z      4
4  foobar_a  raw_a      1
5  foobar_b  raw_b      1

Upvotes: 1

Related Questions