Reputation: 47
I have a dataframe:
ID |
---|
239200202 |
14700993 |
1153709258720067584 |
And have a output whether the id is a bot or not in an array form [1,1,0] How can I combine it into one dataframe like:
ID | Bot |
---|---|
239200202 | bot |
14700993 | bot |
1153709258720067584 | Not bot |
I tried this code, but it didn't work:
test = pd.read_csv('./user_data/user_lookup/dataset/test_dataframe.csv', index_col=1)
df = pd.DataFrame(columns=['UserID','Bot/Not'])
for index,row in test.iterrows():
if test[index] == 1:
df.loc[index,['UserID']] = test['User ID']
df.loc[index,['Bot/Not']] = 'Bot'
if test[index] == 0:
df.loc[index, ['UserID']] = test['User ID']
df.loc[index, ['Bot/Not']] = 'Not-Bot'
print(df)
It would be great if someone can help me out. Thank you
Upvotes: 2
Views: 339
Reputation: 42758
Use indexing into an array:
df = pd.DataFrame({'UserID': [239200202, 14700993, 1153709258720067584]})
is_bot = np.array([1,1,0])
df['Bot'] = np.array(['not bot', 'bot'])[is_bot]
Upvotes: 0
Reputation: 260
it's best to use here with pd.concat , to merge this 2 df into one
also, try to avoid iterrows at any cost while working with DataFrames, its substantially slower
example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': [100, 101, 102]})
bot_not_bot = np.array([1,0,1])
df = pd.concat([df, pd.DataFrame({'bot/not bot': bot_not_bot})], axis=1)
instead of using iterrows which is slower, use apply for faster results on larger scale DataFrames
df['bot/not bot'] = df['bot/not bot'].apply(lambda x: 'Bot' if x else 'Not Bot')
This is the correct way to use Dataframes, avoid iterrows
Upvotes: 1
Reputation: 475
Here is the solution to the above problem
array = [1,1,0]
df['BOT']=df.loc[df['ID'].isin(array)]
Upvotes: 1
Reputation: 4875
According to the hints that you have given in the question,
You can add the column name Bot
to the test dataframe as follow:
new_pred = ['bot' if x==1 else 'Not bot' for x in pred_logreg_test]
test['Bot'] = list(new_pred)
Upvotes: 1