artemis
artemis

Reputation: 7261

Joining two pandas dataframes are failing

I have two pandas dataframes that look like this:

df_out:

Prediction       count_human  count_bot  %_bot_tweets
username
666STEVEROGERS             8        131      0.942446
ADELE_BROCK                0        126      1.000000
ADRIANAMFTTT              99          0      0.000000
AHMADRADJAB                0        108      1.000000
ALBERTA_HAYNESS          101          0      0.000000
ALTMANBELINDA              0        139      1.000000
ALVA_MC_GHEE              29        104      0.781955
ANGELITHSS                 0        113      1.000000
ANN1EMCCONNELL             0        125      1.000000
ANWARJAMIL22               0        112      1.000000
AN_N_GASTON                0        107      1.000000
ARONHOLDEN8               89         31      0.258333
ARTHCLAUDIA                0        103      1.000000
ASSUNCAOWALLAS             0        108      1.000000
BECCYWILL                  0        132      1.000000
BELOZEROVNIKIT           132          8      0.057143
BEN_SAR_GENT              24         84      0.777778
BERT_HENLEY              105          0      0.000000
BISHOLORINE                0        117      1.000000
BLACKERTHEBERR5            4        100      0.961538
BLACKTIVISTSUS            49         68      0.581197
BLACK_ELEVATION           32         74      0.698113
BOGDANOVAO2                0        127      1.000000
BREMENBOTE                70         39      0.357798
B_stever96                 0        171      1.000000
CALIFRONIAREP             60         72      0.545455
C_dos_94                   0        121      1.000000
Cassidygirly               0        153      1.000000
ChuckSpeaks_               0        185      1.000000
Cyabooty                 111          0      0.000000
DurkinSays                 0        131      1.000000
LSU_studyabroad          117          0      0.000000
MisMonWEXP               131          0      0.000000
NextLevel_Mel              0        185      1.000000
PeterDuca                108          0      0.000000
ShellMarcel                0         97      1.000000
Sir_Fried_Alott            0        144      1.000000
XavierRivera_            197          0      0.000000
ZacharyFlair             213          0      0.000000
brentvarney44              0        126      1.000000
cbars68                  225          0      0.000000
chloeschultz11             0        106      1.000000
hoang_le_96                0        104      1.000000
kdougherty178              0        127      1.000000
lasallephilo             138          0      0.000000
lovely_cunt_               0        137      1.000000
megliebsch                 0        217      1.000000
msimps_15                138          0      0.000000
okweightlossdna          105          0      0.000000
tankthe_hank             231          0      0.000000

And knn_res:

      following  followers        username  Prediction  is_bot
0           199         77      megliebsch           1       0
1           199         77      megliebsch           1       0
2           199         77      megliebsch           1       0
3           199         77      megliebsch           1       0
4           199         77      megliebsch           1       0
...         ...        ...             ...         ...     ...
6643         67         57  ASSUNCAOWALLAS           1       1
6644         67         57  ASSUNCAOWALLAS           1       1
6645         67         57  ASSUNCAOWALLAS           1       1
6646         67         57  ASSUNCAOWALLAS           1       1
6647         67         57  ASSUNCAOWALLAS           1       1

What I am trying to do is, for each username in df_out, left join to knn_res to get the following and followers values.

In SQL, I could do this with: SELECT a.*, b.following, b.followers FROM df_out a LEFT JOIN knn_res b ON a.username = b.username

I have tried:

test_df = df_out
test_df.set_index('username').join(knn_res.set_index('username'), on='username', how='left')
print(test_df)

Which yielded:

  File "C:\Python367-64\lib\site-packages\pandas\core\frame.py", line 4396, in set_index
    raise KeyError("None of {} are in the columns".format(missing))
KeyError: "None of ['username'] are in the columns"

What am I doing wrong? I tried to reference this documentation for the problem.

UPDATE

I also tried inner join, which yielded the exact same results:

  File "C:\Python367-64\lib\site-packages\pandas\core\frame.py", line 4396, in set_index
    raise KeyError("None of {} are in the columns".format(missing))
KeyError: "None of ['username'] are in the columns"

df_out is created with:

df_out = (knn_res.groupby(['username', 'Prediction']).is_bot.count().unstack(fill_value=0).
             rename({0: 'count_human', 1: 'count_bot'}, axis= 1))

df_out['%_bot_tweets'] = df_out['count_bot'] / (df_out['count_bot'] + df_out['count_human'])

Upvotes: 1

Views: 80

Answers (1)

Andy L.
Andy L.

Reputation: 25239

Try this. Default join option is left, so you don't need specify it. Both dataframes have username as index and join works on index, so you also don't need specifying on option. Lastly, you want to join only columns following and followers, so after set username as index, just slice those 2 columns for joining. (Note: you should use copy() when you want to copy original dataframe to a test_df because without copy(), both point to the same dataframe object)

test_df = df_out.copy()
test_df = test_df.join(knn_res.set_index('username')[['following', 'followers']])
print(test_df)

Out[93]:
                 count_human  count_bot  %_bot_tweets  following  followers
username
666STEVEROGERS             8        131      0.942446        NaN        NaN
ADELE_BROCK                0        126      1.000000        NaN        NaN
ADRIANAMFTTT              99          0      0.000000        NaN        NaN
AHMADRADJAB                0        108      1.000000        NaN        NaN
ALBERTA_HAYNESS          101          0      0.000000        NaN        NaN
ALTMANBELINDA              0        139      1.000000        NaN        NaN
ALVA_MC_GHEE              29        104      0.781955        NaN        NaN
ANGELITHSS                 0        113      1.000000        NaN        NaN
ANN1EMCCONNELL             0        125      1.000000        NaN        NaN
ANWARJAMIL22               0        112      1.000000        NaN        NaN
AN_N_GASTON                0        107      1.000000        NaN        NaN
ARONHOLDEN8               89         31      0.258333        NaN        NaN
ARTHCLAUDIA                0        103      1.000000        NaN        NaN
ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
ASSUNCAOWALLAS             0        108      1.000000       67.0       57.0
BECCYWILL                  0        132      1.000000        NaN        NaN
BELOZEROVNIKIT           132          8      0.057143        NaN        NaN
BEN_SAR_GENT              24         84      0.777778        NaN        NaN
BERT_HENLEY              105          0      0.000000        NaN        NaN
BISHOLORINE                0        117      1.000000        NaN        NaN
BLACKERTHEBERR5            4        100      0.961538        NaN        NaN
BLACKTIVISTSUS            49         68      0.581197        NaN        NaN
BLACK_ELEVATION           32         74      0.698113        NaN        NaN
BOGDANOVAO2                0        127      1.000000        NaN        NaN
BREMENBOTE                70         39      0.357798        NaN        NaN
B_stever96                 0        171      1.000000        NaN        NaN
CALIFRONIAREP             60         72      0.545455        NaN        NaN
C_dos_94                   0        121      1.000000        NaN        NaN
Cassidygirly               0        153      1.000000        NaN        NaN
ChuckSpeaks_               0        185      1.000000        NaN        NaN
Cyabooty                 111          0      0.000000        NaN        NaN
DurkinSays                 0        131      1.000000        NaN        NaN
LSU_studyabroad          117          0      0.000000        NaN        NaN
MisMonWEXP               131          0      0.000000        NaN        NaN
NextLevel_Mel              0        185      1.000000        NaN        NaN
PeterDuca                108          0      0.000000        NaN        NaN
ShellMarcel                0         97      1.000000        NaN        NaN
Sir_Fried_Alott            0        144      1.000000        NaN        NaN
XavierRivera_            197          0      0.000000        NaN        NaN
ZacharyFlair             213          0      0.000000        NaN        NaN
brentvarney44              0        126      1.000000        NaN        NaN
cbars68                  225          0      0.000000        NaN        NaN
chloeschultz11             0        106      1.000000        NaN        NaN
hoang_le_96                0        104      1.000000        NaN        NaN
kdougherty178              0        127      1.000000        NaN        NaN
lasallephilo             138          0      0.000000        NaN        NaN
lovely_cunt_               0        137      1.000000        NaN        NaN
megliebsch                 0        217      1.000000      199.0       77.0
megliebsch                 0        217      1.000000      199.0       77.0
megliebsch                 0        217      1.000000      199.0       77.0
megliebsch                 0        217      1.000000      199.0       77.0
megliebsch                 0        217      1.000000      199.0       77.0
msimps_15                138          0      0.000000        NaN        NaN
okweightlossdna          105          0      0.000000        NaN        NaN
tankthe_hank             231          0      0.000000        NaN        NaN

Upvotes: 1

Related Questions