Reputation: 65
Iam new to python. The below code filter rows in a dataframe df based on substrings (keys) in a list and add a new column say 'Key" containing the substring (all of them). The dataframe contains name of student, age, sport. The sport page contains all sports played by him. the list array contains two sports names. The code here extracts the names that play any of the sports mentions in the list. I want another field 'Key' in the dataframe that mentions the matches from list like "hockey" or " "football" or "jockey football" depending on the match. '''
import requests
import pandas as pd
import numpy as np
data = {'Name': ['Tom', 'Joseph','Krish', 'Mohan', 'Ram'], 'Age': [20, 21, 19, 18, 29],'Sport':['football', 'hockey football badminton', 'cricket', 'tennis football', 'hocey cricet']}
df= pd.DataFrame(data)
print(df)
list = ['football','hockey'] # list of sports to filter
list_s = np.array(list)
print(list_s)
#Filter rows from df which are in list_s
dff = df[df['Sport'].str.contains('|'.join(list_s))]
print(dff)
Upvotes: 0
Views: 216
Reputation: 893
While it might be possible in this case to just use substrings, a more robust approach would be to make a new DataFrame that maps each Name to each associated Sport, and select the desired Names from there.
>>> name_to_sport = (
df[['Name']]
.join(df['Sport'].str.split())
.explode('Sport')
)
>>> name_to_sport
Name Sport
0 Tom football
1 Joseph hockey
1 Joseph football
1 Joseph badminton
2 Krish cricket
3 Mohan tennis
3 Mohan football
4 Ram hocey
4 Ram cricet
>>> name_to_sport.loc[name_to_sport['Sport'].isin(['football','hockey']), 'Name'].unique()
array(['Tom', 'Joseph', 'Mohan'], dtype=object)
Upvotes: 1