ankitpandey
ankitpandey

Reputation: 349

New Column based on Group and Condition

I am trying to create new column based on the SOURCE column value for distinct ID. There are multiple records for same IDs with different or same Name as there source are different. I am trying to create new column CNAME to standardize the Name for each ID based on SOURCE column value.

For each ID, CNAME column should be equal to the NAME where SOURCE == A

df:

ID  NAME    SOURCE
1   ABC     B
1   ABC     C
1   AXY     A
2   XYZ     B
2   XYZ     A
3   SASA    D
3   SASA    B
3   SSA     A
3   SSA     C
4   BVA     A
4   BA      B
5   NAS     A
5   VAN     B

Output:

ID  NAME    SOURCE  CNAME
1   ABC     B       AXY
1   ABC     C       AXY
1   AXY     A       AXY
2   XYZ     B       XYZ
2   XYZ     A       XYZ
3   SASA    D       SSA
3   SASA    B       SSA
3   SSA     A       SSA
3   SSA     C       SSA
4   BVA     A       BVA
4   BA      B       BVA
5   NAS     A       NAS
5   VAN     B       NAS

After grouping, i am not able to understand how to fill back value to new column for all rows. I tried to make use of transform as well but no luck.

df.groupby('ID').apply(lambda x: np.where(x['SOURCE'] == 'A', x['NAME'],' '))

Upvotes: 0

Views: 160

Answers (4)

wwnde
wwnde

Reputation: 26676

sort_values, groupby ID and broadcast the first SOURCE in each group using transform

df['CNAME'] = df.sort_values(by=['ID','SOURCE']).groupby('ID')['NAME'].transform('first')

Following your clarifying question you can;

s=df.query("SOURCE=='A'")#Filter all the A now that they are not repeated

df['CNAME'] = df['ID'].map(dict(zip(s['ID'],s['NAME'])))#create dict and map


     ID  NAME SOURCE CNAME
0    1   ABC      B   AXY
1    1   ABC      C   AXY
2    1   AXY      A   AXY
3    2   XYZ      B   XYZ
4    2   XYZ      A   XYZ
5    3  SASA      D   SSA
6    3  SASA      B   SSA
7    3   SSA      A   SSA
8    3   SSA      C   SSA
9    4   BVA      A   BVA
10   4    BA      B   BVA
11   5   NAS      A   NAS
12   5   VAN      B   NAS

Upvotes: 2

Drakax
Drakax

Reputation: 1483

You can try this:

df = df.sort_values(by=['ID','SOURCE'])
df.loc[df['SOURCE'] == 'A','CNAME'] = df.NAME
df.CNAME = df['CNAME'].ffill()
df
index ID NAME SOURCE CNAME
2 1 AXY A AXY
0 1 ABC B AXY
1 1 ABC C AXY
4 2 XYZ A XYZ
3 2 XYZ B XYZ
7 3 SSA A SSA
6 3 SASA B SSA
8 3 SSA C SSA
5 3 SASA D SSA
9 4 BVA A BVA
10 4 BA B BVA
11 5 NAS A NAS
12 5 VAN B NAS

Upvotes: 0

Zaero Divide
Zaero Divide

Reputation: 799

This is not the proper way of doing it... but it works when you want to select a custom item, like SOURCE B in this case

mapper = df[df['SOURCE']=='B'].set_index('ID')['NAME']
mapper.name = 'NEWID'
df.merge(mapper, on='ID')

    ID  NAME SOURCE NEWID
0    1   ABC      B   ABC
1    1   ABC      C   ABC
2    1   AXY      A   ABC
3    2   XYZ      B   XYZ
4    2   XYZ      A   XYZ
5    3  SASA      D  SASA
6    3  SASA      B  SASA
7    3   SSA      A  SASA
8    3   SSA      C  SASA
9    4   BVA      A    BA
10   4    BA      B    BA
11   5   NAS      A   VAN
12   5   VAN      B   VAN

Upvotes: 0

topsail
topsail

Reputation: 3119

This is a little "sql-ish" (creating a lookup table if you will, then using it in a join or merge operation) but also works:

# get the list of CNAME ids
ids = df[df.SOURCE == 'A']

# join/merge the two dataframes
new_df = df.merge(ids, on='ID', how='left')

# capture the new columns from the joined dataframe
new_df = new_df[['ID', 'NAME_x', 'SOURCE_x', 'NAME_y']]

# rename the columns
new_df.columns = ['ID', 'NAME', 'SOURCE', 'CNAME']

Upvotes: 0

Related Questions