Reputation: 377
Let's assume I have two data frames df1 and df2. In df1 I have several columns such as userid, sexid, location and etc. And in df2 I have all the same columns as in df1 except for sexid which I need to fill using some prediction algorithm. I am just a beginner and I tried another kind of problems. So any advice or useful references which may help me to crack it are welcomed.
Upvotes: 1
Views: 5792
Reputation: 12515
A minimal example:
import pandas as pd
from sklearn.linear_model import LogisticRegression
df1 = pd.DataFrame({'sexid': list('MMFFMFFMMF'), 'x1': [0, 12, 2, 3, 4, 2, 0, 12, 12, 12], 'x2': [0, 1, 1, 1, 0, 1, 1, 0, 0, 1]})
df2 = pd.DataFrame({'x1': [0, 12, 2, 3, 4, 2, 0, 12, 12, 12], 'x2': [0, 1, 1, 1, 0, 1, 1, 0, 0, 1]})
X = df1[['x1', 'x2']]
y = df1['sexid']
model = LogisticRegression()
model.fit(X, y)
model.predict(df2)
Which returns:
array(['F', 'M', 'F', 'F', 'M', 'F', 'F', 'M', 'M', 'M'], dtype=object)
I would highly recommend you read this.
Upvotes: 3