e9e9s
e9e9s

Reputation: 955

How to create new column based on multiple conditions from existing column pandas

So I have a df column of 9 digit IDs. There are no duplicates and each ID starts with a different number that ranges from 1-6 -- depending on the number each ID starts with I want to create a separate column with the "name" that the first number of the ID represents. (e.g. IDs that start with 1 represent Maine, IDs that start with 2 represent California... and so on)

This works if it was only 2 conditions:

df['id_label'] = ['name_1' if name.startswith('1') else 'everything_else' for name in df['col_1']]

I couldn't figure out how to create a multi line line comprehension for what I need so I thought this would work, but it only creates the id_label column from the last iteration of the loop (i.e. the id_label column will only contain 'name_5):

for col in df['col_1']:
    if col.startswith('1'):
        df['id_label'] = 'name_1'
    if col.startswith('2'):
        df['id_label'] = 'name_2'
    if col.startswith('3'):
       df['id_label'] = 'name_3'
    if col.startswith('4'):
        df['id_label'] = 'name_4'
    if col.startswith('5'):
        df['id_label'] = 'name_5'
    if col.startswith('6'):
        df['id_label'] = 'name_5'

My question is how can I create a new column from an old column based on multiple conditional statements?

Upvotes: 2

Views: 9240

Answers (3)

Preetham
Preetham

Reputation: 577

Can you check this and let me know whether its suitable for your question.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})

df['col_2'] = df['col_1'].astype(str).str[0]

condlist = [df['col_2'] == "1",
            df['col_2'] == "2",
            df['col_2'] == "3",
            df['col_2'] == "4",
            ((df['col_2'] == "5") | (df['col_2'] == "6")),
            ]
choicelist = ['Maine','California','India', 'Frnace','5/6']

df['id_label'] = np.select(condlist, choicelist)

print(df)

#### Output ####

   col_1 col_2    id_label
0    133     1       Maine
1    255     2  California
2     36     3       India
3    477     4      Frnace
4     55     5         5/6
5     63     6         5/6

PS: Thanks for @ALollz who introduced me to np.select

Upvotes: 1

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

You can use apply in case you have lot of if elses

def ifef(col):
    col = str(col)
    if col.startswith('1'):
        return  'name_1'
    if col.startswith('2'):
        return 'name_2'
    if col.startswith('3'):
        return 'name_3'
    if col.startswith('4'):
        return'name_4'
    if col.startswith('5'):
        return 'name_5'
    if col.startswith('6'):
        return 'name_5'
df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
df['id_label'] = df['col_1'].apply(ifef)
   col_1 id_label
0    133   name_1
1    255   name_2
2     36   name_3
3    477   name_4
4     55   name_5
5     63   name_5

In case if you have a dictionaary you can use

df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
d = {'1':'M', '2': 'C', '3':'a', '4':'f', '5':'r', '6':'s'}
def ifef(col):
    col = str(col)
    return d[col[0]]

df['id_label'] = df['col_1'].apply(ifef)
print(df)
  col_1 id_label
0    133        M
1    255        C
2     36        a
3    477        f
4     55        r
5     63        s

Upvotes: 2

jezrael
jezrael

Reputation: 863216

I think you can convert column to str by astype, select first value and last map by dict:

df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
print (df)

d = {'1':'Maine', '2': 'California', '3':'a', '4':'f', '5':'r', '6':'r'}
df['id_label'] = df['col_1'].astype(str).str[0].map(d)
print (df)
   col_1    id_label
0    133       Maine
1    255  California
2     36           a
3    477           f
4     55           r
5     63           r

Upvotes: 2

Related Questions