Reputation: 555
This issue arises quite often and my current method isn't the optimal one.
eg.
I've a dataframe (df1) that contains some data:
date tweet sent_by
2017/01/01 'OMG @user2 @POTUS' user1
2017/01/01 'THIS IS A TWEET!' user17
...... ..... ......
I've a list of users:
user_list = ['user1','user2', ...]
Now I'd like to go through df1 and check whether any user (exluding the sender) is mentioned in the tweet and create a new dataframe (df2) that contains the relevant information.
My current method:
date_list = []
tweet_list = []
sent_by_list = []
user_mentioned_list = []
for i in range(len(df1)):
for user in user_list:
if user in df1['tweet'][i] and user != df1['sent_by'][i]:
date_list.append(df1['date'][i])
tweet_list.append(df1['tweet'][i])
sent_by_list.append(df1['sent_by'][i])
user_mentioned_list.append(user)
df2 = pd.DataFrame()
df2['date'] = date_list
df2['tweet'] = tweet_list
df2['sent_by'] = sent_by_list
df2['user_mentioned'] = user_mentioned_list
What is the best way to fetch the data and form a new dataframe?
Upvotes: 1
Views: 45
Reputation: 863291
I think you need extract
by list
with |
regex (or)
and then compare with isin
, last filter by boolean indexing
:
print (df1)
date tweet sent_by
0 2017/01/01 'OMG @user2 @POTUS' user1
1 2017/01/01 'OMG @user2 @POTUS' user2
2 2017/01/01 'OMG @user2 @POTUS' user4
3 2017/01/01 'OMG @user4 @POTUS' user1
4 2017/01/01 'THIS IS A TWEET!' user17
user_list = ['user1','user2']
pat = '(' + '|'.join(user_list) + ')'
df1['user_mentioned'] = df1['tweet'].str.extract(pat, expand=False)
df2 = df1[df1['user_mentioned'].isin(user_list) & (df1['user_mentioned']!= df1['sent_by'])]
print (df2)
date tweet sent_by user_mentioned
0 2017/01/01 'OMG @user2 @POTUS' user1 user2
2 2017/01/01 'OMG @user2 @POTUS' user4 user2
Detail:
print (df1)
date tweet sent_by user_mentioned
0 2017/01/01 'OMG @user2 @POTUS' user1 user2
1 2017/01/01 'OMG @user2 @POTUS' user2 user2
2 2017/01/01 'OMG @user2 @POTUS' user4 user2
3 2017/01/01 'OMG @user4 @POTUS' user1 NaN
4 2017/01/01 'THIS IS A TWEET!' user17 NaN
Upvotes: 1