Reputation: 95
Let me explain this situation. the thing is i'm currently working with data that is categorized sometimes and sometimes don't. So i decided to use fillna's pandas with 'ffil' as method. I just don't feel this is the optimal and/or cleaner solution. if someone could help me with a better aproach i'll be so grateful. Here some code to demostrate the point:
data = {
"detail":['apple mac', 'apple iphone x', 'samsumg galaxy s10', 'samsumg galaxy s10', 'hp computer'],
'category': ['computer', 'phone', 'phone', np.NaN, np.NaN]
}
df = pd.DataFrame(data)
Returns
detail category
0 apple mac computer
1 apple iphone x phone
2 samsumg galaxy s10 phone
3 samsumg galaxy s10 NaN
4 hp computer NaN
first i filtered detail values without category:
details_without_cats = df[df.category.isnull()].detail.unique()
then i loop through these values to fill if correponds:
for detail_wc in details_without_cats:
df[df.detail == detail_wc] = df[df.detail == detail_wc].fillna(method = 'ffill')
print(df)
returns exactly what i want
detail category
0 apple mac computer
1 apple iphone x phone
2 samsumg galaxy s10 phone
3 samsumg galaxy s10 phone
4 hp computer NaN
the dilemma is as follows. What happens if i have this situation with thousands or millions of samples. Is there a better way? please help
Upvotes: 2
Views: 88
Reputation: 7214
If you want to create a dict of items with values to use later you can do this:
maps = df.dropna().set_index('detail').to_dict()['category']
df['category'] = df.set_index('detail').index.map(maps)
maps
{'apple mac': 'computer',
'apple iphone x': 'phone',
'samsumg galaxy s10': 'phone'}
output:
detail category
0 apple mac computer
1 apple iphone x phone
2 samsumg galaxy s10 phone
3 samsumg galaxy s10 phone
4 hp computer NaN
Upvotes: 1
Reputation: 323306
We can do
df['category']=df.groupby('detail')['category'].ffill()
df
detail category
0 apple mac computer
1 apple iphone x phone
2 samsumg galaxy s10 phone
3 samsumg galaxy s10 phone
4 hp computer NaN
Upvotes: 1