Reputation: 298
I have the following dataframe:
data = [
(27450, 27450, 29420,"10/10/2016"),
(29420 , 36142, 29420, "10/10/2016"),
(11 , 11, 27450, "10/10/2016")]
#Create DataFrame base
df = pd.DataFrame(data, columns=("User_id","Actor1","Actor2", "Time"))
The first column contains the user_id, and each line represents one action that he makes. Each user_id shows up either in "Actor1" or "Actor2" column.
First, I would like to create a new column where it will assign the value 1 if the user_id is found in "Actor1" column and 0 otherwise.
Second, I would like to create a new column where for each user_id it will store the "Actor"_i value that he interacted with.
For the above example, the output will look like:
Col1 Col2
1 29420
0 36142
1 27450
What is the most efficient pythonic way to do this?
Thanks a lot in advance!
Upvotes: 0
Views: 3072
Reputation: 5870
Heres my solution - I have assumed that if userid appears in actor1 column its not necessary it'll be in the same row...
df["Col1"] = [1 if i in df["Actor1"].values else 0 for i in df["User_id"].values]
df["Col2"] = [df.iloc[i]["Actor2"] if j == 1 else df.iloc[i]["Actor1"] for i, j in enumerate(df["Col1"].values)]
Output -
User_id Actor1 Actor2 Time Col1 Col2
0 27450 27450 29420 10/10/2016 1 29420
1 29420 36142 29420 10/10/2016 0 36142
2 11 11 27450 10/10/2016 1 27450
Upvotes: 0
Reputation: 880587
import numpy as np
import pandas as pd
data = [(27450, 27450, 29420,"10/10/2016"),
(29420 , 36142, 29420, "10/10/2016"),
(11 , 11, 27450, "10/10/2016")]
df = pd.DataFrame(data, columns=("User_id","Actor1","Actor2", "Time"))
mask = (df['User_id'] == df['Actor1'])
df['first actor'] = mask.astype(int)
df['other actor'] = np.where(mask, df['Actor2'], df['Actor1'])
print(df)
yields
User_id Actor1 Actor2 Time first actor other actor
0 27450 27450 29420 10/10/2016 1 29420
1 29420 36142 29420 10/10/2016 0 36142
2 11 11 27450 10/10/2016 1 27450
First create a boolean mask which is True when User_id
equals Actor1
:
In [51]: mask = (df['User_id'] == df['Actor1']); mask
Out[51]:
0 True
1 False
2 True
dtype: bool
Converting mask
to ints creates the first column:
In [52]: mask.astype(int)
Out[52]:
0 1
1 0
2 1
dtype: int64
Then use np.where
to select between two values. np.where(mask, A, B)
returns an array whose ith
value is A[i]
if mask[i]
is True, and B[i]
otherwise. Thus,
np.where(mask, df['Actor2'], df['Actor1'])
takes the value from Actor2
where mask
is True, and the value from Actor1
otherwise:
In [53]: np.where(mask, df['Actor2'], df['Actor1'])
Out[53]: array([29420, 36142, 27450])
Upvotes: 2