Mohammad Amir
Mohammad Amir

Reputation: 133

Making a column of STATE

I have a data like this -

ID     CITY     STATE     Location
1       a         b          a
1       c         w          a
2       e         b          e
2       a         f          e

I want a new column of STATE such that when CITY and Location are same then STATE is "b" for all ID's.

Expected Output:

ID     CITY     STATE     Location     STATE2
1       a         b          a           b
1       c         d          w           b 
2       e         m          e           m
2       u         f          e           m

Upvotes: 0

Views: 234

Answers (3)

Mayank Porwal
Mayank Porwal

Reputation: 34086

Use numpy.where:

In [1724]: import numpy as np

In [1732]: x = df.loc[df.CITY == df.Location, 'STATE'].unique()[0]

In [1725]: df['STATE2'] = np.where(df.CITY == df.Location, df.STATE, x)

In [1726]: df
Out[1726]: 
   ID CITY STATE Location STATE2
0   1    a     b        a      b
1   2    c     d        a      b
2   3    e     b        a      b
3   4    a     b        a      b

Upvotes: 1

Muralidharan M
Muralidharan M

Reputation: 27

Solution 1:

df.loc[df["City"]==df["Location"],"state2"] = "b"

Solution 2: if you would like to specify a value for rows that do no match

df.apply(lambda x: "b" if x["City"]==x["Location"] else "others",axis=1 )

Solution 3: if the conditions are more complex then its better to use a mask

mask = (df['City']==df["Location"])
df["state3"] = df["state3"].mask(mask,"b")

Upvotes: 0

David Erickson
David Erickson

Reputation: 16683

Please see two solutions depending on what you are trying to do.

Solution #1: Assuming the input:

   ID CITY STATE Location
0   1    a     b        a
1   1    c     d        w
2   2    e     m        e
3   2    u     f        e

You can use mask to change STATE values for ID that have at least one matching row of CITY and Location:

m = df.loc[df['CITY'] == df['Location'], 'ID']
df['STATE 2'] = df['STATE'].mask(df['ID'].isin(m),
                                 df.groupby('ID')['STATE'].transform('first'))
df
Out[1]: 
   ID CITY STATE Location STATE 2
0   1    a     b        a       b
1   1    c     d        w       b
2   2    e     m        e       m
3   2    u     f        e       m

Solution #2:

Assuming slightly different input (the approach changes a good amount depending on desired output):

   ID CITY STATE Location
0   1    g     f        c
1   1    a     b        a
2   1    c     d        w
3   2    e     m        e
4   2    u     f        e

m = df.loc[df['CITY'] == df['Location']].drop_duplicates('CITY').index
df['State 2'] = np.where(df.index.isin(m), df['STATE'], np.nan)
df['State 2'] = df.groupby('ID')['State 2'].transform('first')
df
Out[2]: 
   ID CITY STATE Location State 2
0   1    g     f        c       b
1   1    a     b        a       b
2   1    c     d        w       b
3   2    e     m        e       m
4   2    u     f        e       m

This solution assumes you want the output to be the first row within each group that matches. Please see my comment. One of these two solutions are hopefully what you are looking for.

Upvotes: 3

Related Questions