Reputation: 99
I have two dataframes A & B, I am adding a column in dataframe A and trying to fill it up depending on the values of three columns in dataframe B.
This is dataframe B:
Clinton Trump State
0 151581.0 130614.0 Alabama
1 68429.0 91087.0 California
2 62435.0 89199.0 Arizona
3 22927.0 72846.0 Alaska
4 58669.0 33928.0 Arkansas
This is dataframe A:
State Education Percentage
0 Alabama NOHighSchoolPercent 0.85
1 Alabama PostgraduatePercent 0.55
2 Alabama BachelorsPercent 0.69
3 Arkansas NOHighSchoolPercent 0.80
4 California NOHighSchoolPercent 0.72
I want to add a column "Majority" in A that either says "Trump" or " Clinton" based on the number of votes in dataframe B. The first condition is that the states should match, and the second condition checks who got more votes.
The resulting dataframe A should look something like this:
State Education Percentage Majority
0 Alabama NOHighSchoolPercent 0.85 Clinton
1 Alabama PostgraduatePercent 0.55 Clinton
2 Alabama BachelorsPercent 0.69 Clinton
3 Arkansas NOHighSchoolPercent 0.80 Clinton
4 California NOHighSchoolPercent 0.72 Trump
I know how to add a column and assign a condition, the trouble is that the conditions are between 3 columns in a separate dataframe (B)
I was thinking something along those lines:
def get_result(votes_clinton, votes_trump):
if votes_clinton > votes_trump
return 'clinton'
else
return 'trump'
A['Majority'] = B['Clinton','Trump'].map(get_result)
However I still need to account to whether the states are equal and this is where the code structure gets confusing
Note: Some states may not be present in both dataframes
Upvotes: 0
Views: 51
Reputation: 59579
The join is on a single non-duplicated column in dfb
, so map
is appropriate. Set the index then use .idxmax(axis=1)
.
dfa['Majority'] = dfa.State.map(dfb.set_index('State').idxmax(1))
print(dfa)
# State Education Percentage Majority
#0 Alabama NOHighSchoolPercent 0.85 Clinton
#1 Alabama PostgraduatePercent 0.55 Clinton
#2 Alabama BachelorsPercent 0.69 Clinton
#3 Arkansas NOHighSchoolPercent 0.80 Clinton
#4 California NOHighSchoolPercent 0.72 Trump
The restult of dfb.set_index('State').idxmax(1)
is a Series, indexed by 'State'
, which can be used to map the 'State'
in dfa
to the majority candidate.
State
Alabama Clinton
California Trump
Arizona Trump
Alaska Trump
Arkansas Clinton
dtype: object
Upvotes: 2