domiziano
domiziano

Reputation: 470

Subset a row based on the column with similar name

Assuming a pandas dataframe like the one in the picture, I would like to fill the na values based with the value of the other variable similar to it. To be more clear, my variables are

mean_1, mean_2 .... , std_1, std_2, ... min_1, min_2 ...

So I would like to fill the na values with the values of the other columns, but not all the columns, only those whose represent the same metric, in the picture i highligted 2 na values. The first one I would like to fill it with the mean obtain from the variables 'MEAN' at row 2, while the second na I would like to fill it with the mean obtain from variable 'MIN' at row 9. Is there a way to do it?

enter image description here

Upvotes: 1

Views: 71

Answers (2)

Shijith
Shijith

Reputation: 4872

you can find the unique prefixes, iterate through each and do fillna for subsets seperately

uniq_prefixes = set([x.split('_')[0] for x in df.columns])

for prfx in uniq_prefixes:
    mask = [col for col in df if col.startswith(prfx)]
    # Transpose is needed because row wise fillna  is not implemented yet
    df.loc[:,mask] = df[mask].T.fillna(df[mask].mean(axis=1)).T

Upvotes: 1

Kate Melnykova
Kate Melnykova

Reputation: 1873

Yes, it is possible doing it using the loop. Below is the naive approach, but even for fancier ones, it is not much optimisation (at least I don't see them).

for i, row in df.iterrows():
    sum_means = 0
    n_means = 0
    sum_stds = 0
    n_stds = 0
    fill_mean_idxs = []
    fill_std_idxs = []
    for idx, item in item.iteritems():
        if idx.startswith('mean') and item is None:
            fill_mean_idxs.append(idx)
        elif idx.startswith('mean'):
            sum_means += float(item)
            n_means += 1
        elif idx.startswith('std') and item is None:
            fill_std_idxs.append(idx)
        elif idx.startswith('std'):
            sum_stds += float(item)
            n_stds += 1
    ave_mean = sum_means / n_means
    std_mean = sum_stds / n_stds
    for idx in fill_mean_idx:
        df.loc[i, idx] = ave_mean
    for idx in fill_std_idx:
        df.loc[i, idx] = std_mean

Upvotes: 1

Related Questions