user2906657
user2906657

Reputation: 541

Data manipulation in Pandas/Python

It seems to be simple data manipulation operation. But I am stuck at this.

I have a recommendation dataset for a campaign.

Masteruserid content 

1             100
1             101
1             102
2             100
2             101
2             110

Now for each user we want to recommend atleast 5 content. So for instance Masteruserid 1 has three recommendations, I want to pick remaining two randomly from globally viewed content, which is a separate dataset(list). Then I have to also check for duplicates in case if the randomly picked is already present in the raw dataset.

global_content
100
300
301
101

In actual I have around 4000+ Masteruserid's. Now I want assistance in just how to start approaching this.

Upvotes: 2

Views: 155

Answers (2)

Merlin
Merlin

Reputation: 25649

Try this, using this as recs list,

df2['global_content']

0    100
1    300
2    301
3    101
4    400
5    500
6    401
7    501

recs = pd.DataFrame()
recs['content'] = df.groupby('Masteruserid')['content'].apply(lambda x: list(x) + np.random.choice(df2[~df2.isin(list(x))].dropna().values.flatten(), 2, replace=False).tolist())
recs

                                    content
Masteruserid                               
1             [100, 101, 102, 300.0, 301.0]
2             [100, 101, 110, 501.0, 301.0]

Upvotes: 0

piRSquared
piRSquared

Reputation: 294358

def add_content(df, gc, k=5):
    n = len(df)
    gcs = set(gc.squeeze())
    if n < k:
        choices = list(gcs.difference(df.content))
        mc = np.random.choice(choices, k - n, replace=False)
        ids = np.repeat(df.Masteruserid.iloc[-1], k - n)
        data = dict(Masteruserid=ids, content=mc)

        return df.append(pd.DataFrame(data), ignore_index=True)


gb = df.groupby('Masteruserid', group_keys=False)
gb.apply(add_content, gc).reset_index(drop=True)

enter image description here

Upvotes: 1

Related Questions