Creating a new dataframe column based on row values from multiple columns

Question

this is my first question on StackOverflow, so I apologize if the formatting isn't perfect.

I've concatenated multiple dataframes and now I'm unable to figure out how to create a new column - df["population"] based on values from other columns - df["2013 pop"], df["2014 pop"] etc. For example, if the event occurred in 2014, meaning the df["Year"] == 2014, I want to take the population from the df["2014 pop"] column and plug it into the new df["population"] column. I'm explaining this horribly I know, I'm just frustrated over something I feel like I should be able to do easily. Here's a summarization of the dataframe and what I've tried so far.

"Year" : [2013,2014,2015...],
"State" : ["Louisana", "Texas", "California"... ],
"City" : ["New Orleans", "Dallas", "Sacramento"...],
"Number Killed" : [4,6,2,4],
"Safety Grade" : ["A", "B", "C", "D"...],
"2013 Pop" : [421329, 232321, 2454543....],
"2014 Pop" : [454545, 655654, 3421342....],
"2015 Pop" : [142314, 454355, 4324323....],
"Incident Date(datetime dtype)" : [12-29-2014, 3-12-2017...]
}

df = pd.DataFrame(d)

I've tried mapping, loc, apply, and I just can't find a solution. I think I'm on the right track with defining a function with conditionals but I'm getting thrown an error.

def categorise(row):
  if row["Year"] == 2014:
    return df["2014 Pop"]
  elif row["Year"] == 2015:
    return df["2015 Pop"]
  elif row["Year"] == 2016:
    return df["2016 Pop"]
  elif row["Year"] == 2017:
    return df["2017 Pop"]
  else:
    return "NONE"

When I try this:

df["Population"] = df.apply(lambda row : categorise(row), axis = 1)

I get the Value Error " Wrong number of items passed 3609 (length of the df), placement implies 1

Does anyone have a suggestion for how to create the df["Population"] column based on my poorly worded question?

Creating a new dataframe column based on row values from multiple columns

Answers (1)

Related Questions