Hassan Dbouk
Hassan Dbouk

Reputation: 99

Iterating through a dataframe then adding data from another dataframe depending on two conditions

I have two dataframes A & B, I am adding a column in dataframe A and trying to fill it up depending on the values of three columns in dataframe B.

This is dataframe B:

    Clinton    Trump       State
0   151581.0   130614.0    Alabama
1   68429.0    91087.0     California
2   62435.0    89199.0     Arizona
3   22927.0    72846.0     Alaska
4   58669.0    33928.0     Arkansas

This is dataframe A:

    State       Education             Percentage
0   Alabama     NOHighSchoolPercent   0.85
1   Alabama     PostgraduatePercent   0.55
2   Alabama     BachelorsPercent      0.69
3   Arkansas    NOHighSchoolPercent   0.80
4   California  NOHighSchoolPercent   0.72

I want to add a column "Majority" in A that either says "Trump" or " Clinton" based on the number of votes in dataframe B. The first condition is that the states should match, and the second condition checks who got more votes.

The resulting dataframe A should look something like this:

        State       Education             Percentage   Majority
    0   Alabama     NOHighSchoolPercent   0.85         Clinton
    1   Alabama     PostgraduatePercent   0.55         Clinton
    2   Alabama     BachelorsPercent      0.69         Clinton
    3   Arkansas    NOHighSchoolPercent   0.80         Clinton
    4   California  NOHighSchoolPercent   0.72         Trump

I know how to add a column and assign a condition, the trouble is that the conditions are between 3 columns in a separate dataframe (B)

I was thinking something along those lines:

def get_result(votes_clinton, votes_trump):
    if votes_clinton > votes_trump
        return 'clinton'
    else
        return 'trump'

A['Majority'] = B['Clinton','Trump'].map(get_result)

However I still need to account to whether the states are equal and this is where the code structure gets confusing

Note: Some states may not be present in both dataframes

Upvotes: 0

Views: 51

Answers (1)

ALollz
ALollz

Reputation: 59579

The join is on a single non-duplicated column in dfb, so map is appropriate. Set the index then use .idxmax(axis=1).

dfa['Majority'] = dfa.State.map(dfb.set_index('State').idxmax(1))

print(dfa)
#        State            Education  Percentage Majority
#0     Alabama  NOHighSchoolPercent        0.85  Clinton
#1     Alabama  PostgraduatePercent        0.55  Clinton
#2     Alabama     BachelorsPercent        0.69  Clinton
#3    Arkansas  NOHighSchoolPercent        0.80  Clinton
#4  California  NOHighSchoolPercent        0.72    Trump

The restult of dfb.set_index('State').idxmax(1) is a Series, indexed by 'State', which can be used to map the 'State' in dfa to the majority candidate.

State
Alabama       Clinton
California      Trump
Arizona         Trump
Alaska          Trump
Arkansas      Clinton
dtype: object

Upvotes: 2

Related Questions