Reputation: 955
So I have a df column of 9 digit IDs. There are no duplicates and each ID starts with a different number that ranges from 1-6 -- depending on the number each ID starts with I want to create a separate column with the "name" that the first number of the ID represents. (e.g. IDs that start with 1 represent Maine, IDs that start with 2 represent California... and so on)
This works if it was only 2 conditions:
df['id_label'] = ['name_1' if name.startswith('1') else 'everything_else' for name in df['col_1']]
I couldn't figure out how to create a multi line line comprehension for what I need so I thought this would work, but it only creates the id_label
column from the last iteration of the loop (i.e. the id_label
column will only contain 'name_5
):
for col in df['col_1']:
if col.startswith('1'):
df['id_label'] = 'name_1'
if col.startswith('2'):
df['id_label'] = 'name_2'
if col.startswith('3'):
df['id_label'] = 'name_3'
if col.startswith('4'):
df['id_label'] = 'name_4'
if col.startswith('5'):
df['id_label'] = 'name_5'
if col.startswith('6'):
df['id_label'] = 'name_5'
My question is how can I create a new column from an old column based on multiple conditional statements?
Upvotes: 2
Views: 9240
Reputation: 577
Can you check this and let me know whether its suitable for your question.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
df['col_2'] = df['col_1'].astype(str).str[0]
condlist = [df['col_2'] == "1",
df['col_2'] == "2",
df['col_2'] == "3",
df['col_2'] == "4",
((df['col_2'] == "5") | (df['col_2'] == "6")),
]
choicelist = ['Maine','California','India', 'Frnace','5/6']
df['id_label'] = np.select(condlist, choicelist)
print(df)
#### Output ####
col_1 col_2 id_label
0 133 1 Maine
1 255 2 California
2 36 3 India
3 477 4 Frnace
4 55 5 5/6
5 63 6 5/6
PS: Thanks for @ALollz who introduced me to np.select
Upvotes: 1
Reputation: 30605
You can use apply
in case you have lot of if elses
def ifef(col):
col = str(col)
if col.startswith('1'):
return 'name_1'
if col.startswith('2'):
return 'name_2'
if col.startswith('3'):
return 'name_3'
if col.startswith('4'):
return'name_4'
if col.startswith('5'):
return 'name_5'
if col.startswith('6'):
return 'name_5'
df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
df['id_label'] = df['col_1'].apply(ifef)
col_1 id_label 0 133 name_1 1 255 name_2 2 36 name_3 3 477 name_4 4 55 name_5 5 63 name_5
In case if you have a dictionaary you can use
df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
d = {'1':'M', '2': 'C', '3':'a', '4':'f', '5':'r', '6':'s'}
def ifef(col):
col = str(col)
return d[col[0]]
df['id_label'] = df['col_1'].apply(ifef)
print(df)
col_1 id_label 0 133 M 1 255 C 2 36 a 3 477 f 4 55 r 5 63 s
Upvotes: 2
Reputation: 863216
I think you can convert column to str
by astype
, select first value and last map
by dict
:
df = pd.DataFrame({'col_1':[133,255,36,477,55,63]})
print (df)
d = {'1':'Maine', '2': 'California', '3':'a', '4':'f', '5':'r', '6':'r'}
df['id_label'] = df['col_1'].astype(str).str[0].map(d)
print (df)
col_1 id_label
0 133 Maine
1 255 California
2 36 a
3 477 f
4 55 r
5 63 r
Upvotes: 2