Reputation: 21
this is my first question on StackOverflow, so I apologize if the formatting isn't perfect.
I've concatenated multiple dataframes and now I'm unable to figure out how to create a new column - df["population"] based on values from other columns - df["2013 pop"], df["2014 pop"] etc. For example, if the event occurred in 2014, meaning the df["Year"] == 2014, I want to take the population from the df["2014 pop"] column and plug it into the new df["population"] column. I'm explaining this horribly I know, I'm just frustrated over something I feel like I should be able to do easily. Here's a summarization of the dataframe and what I've tried so far.
"Year" : [2013,2014,2015...],
"State" : ["Louisana", "Texas", "California"... ],
"City" : ["New Orleans", "Dallas", "Sacramento"...],
"Number Killed" : [4,6,2,4],
"Safety Grade" : ["A", "B", "C", "D"...],
"2013 Pop" : [421329, 232321, 2454543....],
"2014 Pop" : [454545, 655654, 3421342....],
"2015 Pop" : [142314, 454355, 4324323....],
"Incident Date(datetime dtype)" : [12-29-2014, 3-12-2017...]
}
df = pd.DataFrame(d)
I've tried mapping, loc, apply, and I just can't find a solution. I think I'm on the right track with defining a function with conditionals but I'm getting thrown an error.
def categorise(row):
if row["Year"] == 2014:
return df["2014 Pop"]
elif row["Year"] == 2015:
return df["2015 Pop"]
elif row["Year"] == 2016:
return df["2016 Pop"]
elif row["Year"] == 2017:
return df["2017 Pop"]
else:
return "NONE"
When I try this:
df["Population"] = df.apply(lambda row : categorise(row), axis = 1)
I get the Value Error " Wrong number of items passed 3609 (length of the df), placement implies 1
Does anyone have a suggestion for how to create the df["Population"] column based on my poorly worded question?
Upvotes: 2
Views: 61
Reputation: 30002
You should change df
to row
in your categorise
function
def categorise(row):
if row["Year"] == 2014:
return row["2014 Pop"]
elif row["Year"] == 2015:
return row["2015 Pop"]
elif row["Year"] == 2016:
return row["2016 Pop"]
elif row["Year"] == 2017:
return row["2017 Pop"]
else:
return "NONE"
df["Population"] = df.apply(categorise, axis = 1)
Or use np.select
df["Population"] = np.select(
[df["Year"] == 2014,
df["Year"] == 2015,
df["Year"] == 2016,
df["Year"] == 2017,
],
[df["2014 Pop"],
df["2015 Pop"],
df["2016 Pop"],
df["2017 Pop"],
],
default='NONE'
)
Or with pd.factorize
idx, cols = pd.factorize(df['Year'])
pop = df.filter(like='Pop').rename(columns=lambda x: int(x.split(' ')[0]))
out = pop.reindex(cols, axis=1).to_numpy()[np.arange(len(pop)), idx]
Upvotes: 1