Reputation: 139

Creating a new column in pandas based on key-value relationships

I have a dataframe that contains a series like this:

0    CollgCr
1    Veenker
2    CollgCr
3    Crawfor
4    NoRidge
5    Mitchel
6    Somerst
7     NWAmes
8    OldTown
9    BrkSide

Based on that series, I wanted to create a new column (feature) by grouping them.

For example, if the data is CollgCr or Veenker, then the value in the new column would be "Middle"

I tried using the following code

df_full['NeighborGrp'] = "Upper"
df_full['NeighborGrp'].loc[df_full["Neighborhood"] == "CollgCr"] = "Middle"
df_full['NeighborGrp'].loc[df_full["Neighborhood"] == ["Mitchel", "OldTown", "BrkSide", "Sawyer", "NAmes", "IDOTRR",
                                                          "MeadowV", "Edwards", "NPkVill", "BrDale", "SWISU", "Blueste"]] = "Lower"

The first and second line works well, but the third row returns an error "ValueError: Arrays were different lengths"

Is there any special syntax in padas that allow me to create a new column based on multiple value condition like this?

Thanks

Upvotes: 1

Answers (3)

jezrael

Reputation: 863481

Use Series.map by dictionary with fillna for unmatched values:

d =  {'CollgCr': 'Middle',
      'Veenker': 'Middle',
      "Mitchel": 'Lower',
      "OldTown": 'Lower',
      "BrkSide": 'Lower',
       "Sawyer": 'Lower',
        "NAmes": 'Lower',
       "IDOTRR": 'Lower',
      "MeadowV": 'Lower',
      "Edwards": 'Lower',
      "NPkVill": 'Lower',
       "BrDale": 'Lower',
        "SWISU": 'Lower',
      "Blueste": 'Lower'}

Or create dictionary dynamic:

Mi = ['CollgCr', 'Veenker']
Lo = ["Mitchel", "OldTown", "BrkSide", "Sawyer", "NAmes", "IDOTRR",
      "MeadowV", "Edwards", "NPkVill", "BrDale", "SWISU", "Blueste"]

d = {**dict.fromkeys(Lo, 'Lower'), **dict.fromkeys(Mi, 'Middle')}

df_full['new'] = df_full['city'].map(d).fillna('Upper')
print (df_full)
        city      new
0     CollgCr  Middle
1     Veenker  Middle
2     CollgCr  Middle
3     Crawfor   Upper
4     NoRidge   Upper
5     Mitchel   Lower
6     Somerst   Upper
7      NWAmes   Upper
8     OldTown   Lower
9     BrkSide   Lower

It depends of data, but map should be fastest:

In [25]: %timeit (jez(df_full.copy()))
15 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [26]: %timeit (raf(df_full.copy()))
20.3 ms ± 347 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [27]: %timeit (ct(df_full.copy()))
26.9 ms ± 286 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Code for timings:

df_full = pd.DataFrame({'city': ['CollgCr', 'Veenker', 'CollgCr', 'Crawfor',
                            'NoRidge', 'Mitchel', 'Somerst', 'NWAmes',
                            'OldTown', 'BrkSide']})

#[100000 rows x 1 columns]
df_full = pd.concat([df_full] * 10000, ignore_index=True)


def jez(df_full):

    d =  {'CollgCr': 'Middle',
         'Veenker': 'Middle',
         "Mitchel": 'Lower',
         "OldTown": 'Lower',
         "BrkSide": 'Lower',
         "Sawyer": 'Lower',
         "NAmes": 'Lower',
         "IDOTRR": 'Lower',
         "MeadowV": 'Lower',
         "Edwards": 'Lower',
         "NPkVill": 'Lower',
         "BrDale": 'Lower',
         "SWISU": 'Lower',
         "Blueste": 'Lower'}

    df_full['new'] = df_full['city'].map(d).fillna('Upper')
    return df_full

def raf(df):

    m = ['CollgCr', 'Veenker']
    l = ["Mitchel", "OldTown", "BrkSide", "Sawyer", "NAmes", 
         "IDOTRR","MeadowV", "Edwards", "NPkVill", "BrDale", "SWISU", "Blueste"]

    df['new_col'] = np.select([df.city.isin(l), df.city.isin(m)],
                              ['lower', 'middle'], default='upper')
    return df

def ct(df):
    df_types = pd.DataFrame({'CollgCr': 'Middle',
                             'Veenker': 'Middle',
                             "Mitchel": 'Lower',
                             "OldTown": 'Lower',
                             "BrkSide": 'Lower',
                             "Sawyer": 'Lower',
                             "NAmes": 'Lower',
                             "IDOTRR": 'Lower',
                             "MeadowV": 'Lower',
                             "Edwards": 'Lower',
                             "NPkVill": 'Lower',
                             "BrDale": 'Lower',
                             "SWISU": 'Lower',
                             "Blueste": 'Lower'}, index=['Type']).T



    return df.merge(df_types, left_on='city', right_index=True, how='left').fillna('Upper')

print (jez(df_full.copy()))
print (raf(df_full.copy()))
print (ct(df_full.copy()))

Upvotes: 1

CT Zhu

Reputation: 54380

If you have a table that can represent the relationship between City and Type, merge would be a much more straight forward way (without having to hardcode each city in your script):

In [52]: # In reality you probably should prepare the table elsewhere and read it in as a pandas dataframe
df_types = pd.DataFrame({'CollgCr': 'Middle',
                         'Veenker': 'Middle',
                         "Mitchel": 'Lower',
                         "OldTown": 'Lower',
                         "BrkSide": 'Lower',
                         "Sawyer": 'Lower',
                         "NAmes": 'Lower',
                         "IDOTRR": 'Lower',
                         "MeadowV": 'Lower',
                         "Edwards": 'Lower',
                         "NPkVill": 'Lower',
                         "BrDale": 'Lower',
                         "SWISU": 'Lower',
                         "Blueste": 'Lower'}, index=['Type']).T

df = pd.DataFrame({'city': ['CollgCr', 'Veenker', 'CollgCr', 'Crawfor',
                            'NoRidge', 'Mitchel', 'Somerst', 'NWAmes',
                            'OldTown', 'BrkSide']})

df.merge(df_types, left_on='city', right_index=True, how='left').fillna('Upper')

Out[52]:
      city    Type
0  CollgCr  Middle
1  Veenker  Middle
2  CollgCr  Middle
3  Crawfor   Upper
4  NoRidge   Upper
5  Mitchel   Lower
6  Somerst   Upper
7   NWAmes   Upper
8  OldTown   Lower
9  BrkSide   Lower

Upvotes: 1

rafaelc

Reputation: 59284

Use np.select and pd.Series.isin

m = ['CollgCr', 'Veenker']
l = ["Mitchel", "OldTown", "BrkSide", "Sawyer", "NAmes", "IDOTRR","MeadowV", "Edwards", "NPkVill", "BrDale", "SWISU", "Blueste"]

df['new_col'] = np.select([df.city.isin(l), df.city.isin(m)], ['lower', 'middle'], default='upper')


    city    new_col
0   CollgCr middle
1   Veenker middle
2   CollgCr middle
3   Crawfor upper
4   NoRidge upper
5   Mitchel lower
6   Somerst upper
7   NWAmes  upper
8   OldTown lower
9   BrkSide lower

Upvotes: 3

Creating a new column in pandas based on key-value relationships

Answers (3)

Related Questions