Reputation: 61
I have a pandas dataframe as follows:
name origin delta
0 foo raw_x 3
1 foo raw_y 3
2 bar raw_z 4
3 bar raw_z 4
4 foobar raw_a 1
5 foobar raw_b 1
In the rows where name = bar
the origin
values are same so I can drop duplicates. But in rows where name = foo
, the origin
values are different. I want to modify/update the values in the same dataframe as below,
foo -> foo_x where raw_x
foo -> foo_y where raw_y
foobar -> foobar_a where raw_a
foobar -> foobar_b where raw_b
Name check like if name == 'foo'
isn't possible, so we'll have to go by values that are the same in name
column. How can this be done?
Upvotes: 1
Views: 30
Reputation: 14949
IIUC, you can try:
df['name'] = (
df.groupby('name', sort=False)
.apply(
lambda x: x['name'] + '_' + x['origin'].str.rsplit('_', 1).str[1]
if x['origin'].nunique() > 1
else
x['name']
).values
)
Complete example:
df = pd.DataFrame({'name': {0: 'foo', 1: 'foo', 2: 'bar', 3: 'bar', 4: 'foobar', 5: 'foobar'},
'origin': {0: 'raw_x',
1: 'raw_y',
2: 'raw_z',
3: 'raw_z',
4: 'raw_a',
5: 'raw_b'},
'delta': {0: 3, 1: 3, 2: 4, 3: 4, 4: 1, 5: 1}})
df['name'] = (
df.groupby('name', sort=False)
.apply(
lambda x: x['name'] + '_' + x['origin'].str.rsplit('_', 1).str[1]
if x['origin'].nunique() > 1
else
x['name']
).values
)
name origin delta
0 foo_x raw_x 3
1 foo_y raw_y 3
2 bar raw_z 4
3 bar raw_z 4
4 foobar_a raw_a 1
5 foobar_b raw_b 1
Upvotes: 1