Reputation: 715
Using pandas and numpy I am trying to process a column in a dataframe, and want to create a new column with values relating to it. So if in column x the value 1 is present, in the new column it would be a, for value 2 it would be b etc
I can do this for single conditions, i.e
df['new_col'] = np.where(df['col_1'] == 1, a, n/a)
And I can find example of multiple conditions i.e if x = 3 or x = 4 the value should a, but not to do something like if x = 3 the value should be a and if x = 4 the value be c.
I tried simply running two lines of code such as :
df['new_col'] = np.where(df['col_1'] == 1, a, n/a)
df['new_col'] = np.where(df['col_1'] == 2, b, n/a)
But obviously the second line overwrites. Am I missing something crucial?
Upvotes: 11
Views: 23543
Reputation: 863166
I think you can use loc
:
df.loc[(df['col_1'] == 1, 'new_col')] = a
df.loc[(df['col_1'] == 2, 'new_col')] = b
Or:
df['new_col'] = np.where(df['col_1'] == 1, a, np.where(df['col_1'] == 2, b, np.nan))
Or numpy.select
:
df['new_col'] = np.select([df['col_1'] == 1, df['col_1'] == 2],[a, b], default=np.nan)
Or use Series.map
, if no match get NaN
by default:
d = { 0 : 'a', 1 : 'b'}
df['new_col'] = df['col_1'].map(d)
Upvotes: 19
Reputation: 9018
Use the pandas Series.map instead of where.
import pandas as pd
df = pd.DataFrame({'col_1' : [1,2,4,2]})
print(df)
def ab_ify(v):
if v == 1:
return 'a'
elif v == 2:
return 'b'
else:
return None
df['new_col'] = df['col_1'].map(ab_ify)
print(df)
# output:
#
# col_1
# 0 1
# 1 2
# 2 4
# 3 2
# col_1 new_col
# 0 1 a
# 1 2 b
# 2 4 None
# 3 2 b
Upvotes: 1
Reputation: 31
you could define a dict with your desired transformations. Then loop through the a DataFrame column and fill it.
There may a more elegant ways, but this will work:
# create a dummy DataFrame
df = pd.DataFrame( np.random.randint(2, size=(6,4)), columns=['col_1', 'col_2', 'col_3', 'col_4'], index=range(6) )
# create a dict with your desired substitutions:
swap_dict = { 0 : 'a',
1 : 'b',
999 : 'zzz', }
# introduce new column and fill with swapped information:
for i in df.index:
df.loc[i, 'new_col'] = swap_dict[ df.loc[i, 'col_1'] ]
print df
returns something like:
col_1 col_2 col_3 col_4 new_col
0 1 1 1 1 b
1 1 1 1 1 b
2 0 1 1 0 a
3 0 1 0 0 a
4 0 0 1 1 a
5 0 0 1 0 a
Upvotes: 1
Reputation: 12620
I think numpy choose()
is the best option for you.
import numpy as np
choices = 'abcde'
N = 10
np.random.seed(0)
data = np.random.randint(1, len(choices) + 1, size=N)
print(data)
print(np.choose(data - 1, choices))
Output:
[5 1 4 4 4 2 4 3 5 1]
['e' 'a' 'd' 'd' 'd' 'b' 'd' 'c' 'e' 'a']
Upvotes: 3