Reputation: 37
I am building a recommender system of foods and I have a dataframe:
df:
meat vegetables cheese ketchup egg...
hamburger 3 5 2 2 1
pasta 0 0 4 0 1
soup 0 2 0 0 0
...
I also have a list which contains ingredients that an user does not like:
dislike:["cheese", "egg"]
So what I am trying to do is to create a function which adds a new row "user_name" with a 10 in those ingredients that he/she does not like and a 0 in all the others columns. Output should be:
meat vegetables cheese ketchup egg...
hamburger 3 5 2 2 1
pasta 0 0 4 0 1
soup 0 2 0 0 0
new_user 0 0 10 0 10
...
I have simplify the dataframe and the list in order to make it more comprehensive, but they are actually way more longer.
This is what I have write until now:
def user_pre(df):
dislike=["cheese","egg"]
for ing in dislike:
df.loc["new_user"]= pd.Series({ing:10})
return df
I "works" but only for the last element in dislike list. Besides it does not add a 0 in the other cells but a Nan.
Thank you so much in advance!
Upvotes: 3
Views: 233
Reputation: 307
I am not sure how "healthy" it is to mix users with dishes in a single pandas DataFrame but a function like this should do the work:
def insert_user_dislikes(user_name='new_user', df=df, ingredients=['meat', 'egg']):
df.loc[user_name] = [10 if col in ingredients else 0 for col in df.columns]
insert_user_dislikes('new_user', df, ['meat', 'egg'])
Edit 1: I like @Fred's Solution as well:
def insert_user_dislikes2(user_name='new_user', df=df, ingredients=['meat', 'egg']):
df.loc[user_name] = 0
df.loc[user_name, ingredients] = 10
insert_user_dislikes('user_name', df, ['meat', 'egg'])
Edit 2: Here is Shubham's solution for performance assessment:
def insert_user_dislikes3(user_name='new_user', df=df, ingredients=['meat', 'egg']):
s = pd.Series(
np.where(df.columns.isin(ingredients), 10, 0),
name=user_name, index=df.columns, dtype='int')
return df.append(s)
In term of performance (on a very small dataset), it looks like the list comprehension one is faster though:
df = pd.DataFrame([[3, 5, 2, 2, 1],
[0, 0, 4, 0, 1]],
columns=['meat', 'vegetables', 'cheese','ketchup', 'egg'],
index=['hamburger', 'pasta'])
print(timeit.timeit(insert_user_dislikes, number=1000))
0.125
print(timeit.timeit(insert_user_dislikes2, number=1000))
0.547
print(timeit.timeit(insert_user_dislikes3, number=1000))
2.153
Upvotes: 3
Reputation: 71689
You can use Series.isin
to check which column values of dataframe are present in dislike
list, then you can use DataFrame.append
to append the newly created series s
to the original dataframe df
.
Use:
import numpy as np
s = pd.Series(
np.where(df.columns.isin(dislike), 10, 0),
name='new_user', index=df.columns, dtype='int') # create a new pandas series
df = df.append(s)
The resulting dataframe df
will be:
meat vegetables cheese ketchup egg
hamburger 3 5 2 2 1
pasta 0 0 4 0 1
soup 0 2 0 0 0
new_user 0 0 10 0 10
Upvotes: 0
Reputation: 3184
Set the new_user row = to zero, then filter and equal to 10.
print(df)
meat vegetables cheese ketchup egg
hamburger 3 5 2 2 1
pasta 0 0 4 0 1
soup 0 2 0 0 0
Create new_user as zero.
df.loc["new_user", :] = 0
print(df)
meat vegetables cheese ketchup egg
hamburger 3.0 5.0 2.0 2.0 1.0
pasta 0.0 0.0 4.0 0.0 1.0
soup 0.0 2.0 0.0 0.0 0.0
new_user 0.0 0.0 0.0 0.0 0.0
Then again but filtered and set to 10.
dislike = ["cheese", "egg"]
df.loc["new_user", dislike] = 10
print(df)
meat vegetables cheese ketchup egg
hamburger 3.0 5.0 2.0 2.0 1.0
pasta 0.0 0.0 4.0 0.0 1.0
soup 0.0 2.0 0.0 0.0 0.0
new_user 0.0 0.0 10.0 0.0 10.0
Upvotes: 0
Reputation: 502
I'm not sure about how efficient the approach is, but this should work
dislikes = ["cheese","egg"]
new_user = "Tom"
df.loc[new_user] = 0
for dislike in dislikes:
if dislike not in df.columns:
df[dislike] = 0
df.loc[new_user, dislike] = 10
Upvotes: 2