Jose_Chavez
Jose_Chavez

Reputation: 95

is there a better way to do segmented fillna with method 'ffill' with pandas?

Let me explain this situation. the thing is i'm currently working with data that is categorized sometimes and sometimes don't. So i decided to use fillna's pandas with 'ffil' as method. I just don't feel this is the optimal and/or cleaner solution. if someone could help me with a better aproach i'll be so grateful. Here some code to demostrate the point:

data = {
    "detail":['apple mac', 'apple iphone x', 'samsumg galaxy s10', 'samsumg galaxy s10', 'hp computer'],
    'category': ['computer', 'phone', 'phone', np.NaN, np.NaN]
}

df = pd.DataFrame(data)

Returns

    detail              category
0   apple mac           computer
1   apple iphone x      phone
2   samsumg galaxy s10  phone
3   samsumg galaxy s10  NaN
4   hp computer         NaN

first i filtered detail values without category:

details_without_cats = df[df.category.isnull()].detail.unique()

then i loop through these values to fill if correponds:

for detail_wc in details_without_cats:
    df[df.detail == detail_wc] = df[df.detail == detail_wc].fillna(method = 'ffill')
print(df)

returns exactly what i want

    detail              category
0   apple mac           computer
1   apple iphone x      phone
2   samsumg galaxy s10  phone
3   samsumg galaxy s10  phone
4   hp computer         NaN

the dilemma is as follows. What happens if i have this situation with thousands or millions of samples. Is there a better way? please help

Upvotes: 2

Views: 88

Answers (2)

oppressionslayer
oppressionslayer

Reputation: 7214

If you want to create a dict of items with values to use later you can do this:

maps = df.dropna().set_index('detail').to_dict()['category']
df['category'] = df.set_index('detail').index.map(maps)

maps

{'apple mac': 'computer',
 'apple iphone x': 'phone',
 'samsumg galaxy s10': 'phone'}

output:

               detail  category
0           apple mac  computer
1      apple iphone x     phone
2  samsumg galaxy s10     phone
3  samsumg galaxy s10     phone
4         hp computer       NaN

Upvotes: 1

BENY
BENY

Reputation: 323306

We can do

df['category']=df.groupby('detail')['category'].ffill()
df
               detail  category
0           apple mac  computer
1      apple iphone x     phone
2  samsumg galaxy s10     phone
3  samsumg galaxy s10     phone
4         hp computer       NaN

Upvotes: 1

Related Questions