Reputation: 67
I am trying to create a new column in pandas data frame based on existing columns.
column1 | column2 | column3 | y1 | y2 | y3 |
---|---|---|---|---|---|
100 | 200 | 300 | 2020 | 2021 | 2022 |
100 | 200 | 300 | 2021 | 2022 | 2023 |
100 | 200 | 300 | 2019 | 2020 | 2021 |
I want a new columns vals which take values of col1 if currentyear = y1 and col2 if currentyear = y2
column1 | column2 | column3 | y1 | y2 | y3 | Vals |
---|---|---|---|---|---|---|
100 | 200 | 300 | 2020 | 2021 | 2022 | 200 |
100 | 200 | 300 | 2021 | 2022 | 2023 | 100 |
100 | 200 | 300 | 2019 | 2020 | 2021 | 300 |
I am trying below code :
def assignvalues(df):
if df['y1'] == currentyear:
df['Vals'] = df['col1']
elif df['y2'] == currentyear:
df['Vals'] = df['col2']
elif df['y3'] == currentyear:
df['Vals'] = df['col3']
df.apply(assignvalues)
It does create columns but doesn't store any values.
Upvotes: 1
Views: 1316
Reputation: 23217
You can use np.select()
for faster execution, as follows:
import numpy as np
currentyear = pd.to_datetime('now').year
condlist = [df['y1'] == currentyear,
df['y2'] == currentyear,
df['y3'] == currentyear]
choicelist = [df['column1'],
df['column2'],
df['column3']]
df['Vals'] = np.select(condlist, choicelist, default=np.nan)
The default value if no match of currentyear is NaN
, you can set it to 0 or other values at your choice by modifying the default=
parameter.
Result:
print(df)
column1 column2 column3 y1 y2 y3 Vals
0 100 200 300 2020 2021 2022 200
1 100 200 300 2021 2022 2023 100
2 100 200 300 2019 2020 2021 300
Upvotes: 1
Reputation: 3455
From your DataFrame
:
>>> import pandas as pd
>>> from io import StringIO
>>> df = pd.read_csv(StringIO("""
... column1,column2,column3,y1,y2,y3
... 100,200,300,2020,2021,2022
... 100,200,300,2021,2022,2023
... 100,200,300,2019,2020,2021"""))
>>> df
column1 column2 column3 y1 y2 y3
0 100 200 300 2020 2021 2022
1 100 200 300 2021 2022 2023
2 100 200 300 2019 2020 2021
And the function assignvalues
, which now return the value from the expected column for each if
. We set the currentyear
at 2021
for example :
>>> def assignvalues(df):
... if df['y1'] == currentyear:
... return df['column1']
... elif df['y2'] == currentyear:
... return df['column2']
... elif df['y3'] == currentyear:
... return df['column3']
>>> currentyear = 2021
We can assign to df["Vals"]
an apply()
, as you did, with an axis=1
parameter to get the expected result :
>>> df["Vals"] = df.apply(assignvalues, axis=1)
>>> df
column1 column2 column3 y1 y2 y3 Vals
0 100 200 300 2020 2021 2022 200
1 100 200 300 2021 2022 2023 100
2 100 200 300 2019 2020 2021 300
Upvotes: 1