Ravi
Ravi

Reputation: 65

filter rows in a pandas dataframe from substrings (keys) in a list and also add new column "key" to dataframe containing the substring matched (key)

Iam new to python. The below code filter rows in a dataframe df based on substrings (keys) in a list and add a new column say 'Key" containing the substring (all of them). The dataframe contains name of student, age, sport. The sport page contains all sports played by him. the list array contains two sports names. The code here extracts the names that play any of the sports mentions in the list. I want another field 'Key' in the dataframe that mentions the matches from list like "hockey" or " "football" or "jockey football" depending on the match. '''

import requests
import pandas as pd
import numpy as np

data = {'Name': ['Tom', 'Joseph','Krish', 'Mohan', 'Ram'], 'Age': [20, 21, 19, 18, 29],'Sport':['football', 'hockey football badminton', 'cricket', 'tennis football', 'hocey cricet']}
df= pd.DataFrame(data)
print(df)
list = ['football','hockey']  # list of sports to filter
list_s = np.array(list)
print(list_s)
#Filter rows from df which are in list_s
dff = df[df['Sport'].str.contains('|'.join(list_s))]
print(dff)

Upvotes: 0

Views: 216

Answers (1)

Steele Farnsworth
Steele Farnsworth

Reputation: 893

While it might be possible in this case to just use substrings, a more robust approach would be to make a new DataFrame that maps each Name to each associated Sport, and select the desired Names from there.

>>> name_to_sport = (
    df[['Name']]
    .join(df['Sport'].str.split())
    .explode('Sport')
)
>>> name_to_sport
     Name      Sport
0     Tom   football
1  Joseph     hockey
1  Joseph   football
1  Joseph  badminton
2   Krish    cricket
3   Mohan     tennis
3   Mohan   football
4     Ram      hocey
4     Ram     cricet
>>> name_to_sport.loc[name_to_sport['Sport'].isin(['football','hockey']), 'Name'].unique()
array(['Tom', 'Joseph', 'Mohan'], dtype=object)

Upvotes: 1

Related Questions