ihsansat
ihsansat

Reputation: 503

Change dataframe pandas based one series

I have data and have convert using dataframe pandas :

import pandas as pd
d = [
  (1,70399,0.988375133622),
  (1,33919,0.981573492596),
  (1,62461,0.981426807114),
  (579,1,0.983018778374),
  (745,1,0.995580488899),
  (834,1,0.980942505189)
]
df_new = pd.DataFrame(e, columns=['source_target']).sort_values(['source_target'], ascending=[True])

and i need build series for mapping column source and target into another

e = []
for x in d:
  e.append(x[0])
  e.append(x[1])

e = list(set(e))
df_new = pd.DataFrame(e, columns=['source_target'])

df_new.source_target = (df_new.source_target.diff() != 0).cumsum() - 1
new_ser = pd.Series(df_new.source_target.values, index=new_source_old).drop_duplicates()

so i get series :

source_target
1        0
579      1
745      2
834      3
33919    4
62461    5
70399    6
dtype: int64

i have tried change dataframe df_beda based on new_ser series using :

df_beda.target = df_beda.target.mask(df_beda.target.isin(new_ser), df_beda.target.map(new_ser)).astype(int)
df_beda.source = df_beda.source.mask(df_beda.source.isin(new_ser), df_beda.source.map(new_ser)).astype(int)

but result is :

   source  target    weight
0       0   70399  0.988375
1       0   33919  0.981573
2       0   62461  0.981427
3     579       0  0.983019
4     745       0  0.995580
5     834       0  0.980943

it's wrong, ideal result is :

   source  target    weight
0       0       6  0.988375
1       0       4  0.981573
2       0       5  0.981427
3       1       0  0.983019
4       2       0  0.995580
5       3       0  0.980943

maybe anyone can help me for show where my mistake

Thanks

Upvotes: 1

Views: 59

Answers (1)

Happy001
Happy001

Reputation: 6383

If the order doesn't matter, you can do the following. Avoid for loop unless it's absolutely necessary.

uniq_vals = np.unique(df_beda[['source','target']])
map_dict = dict(zip(uniq_vals, xrange(len(uniq_vals))))
df_beda[['source','target']] = df_beda[['source','target']].replace(map_dict)

print df_beda

   source  target    weight
0       0       6  0.988375
1       0       4  0.981573
2       0       5  0.981427
3       1       0  0.983019
4       2       0  0.995580
5       3       0  0.980943

If you want to roll back, you can create an inverse map from the original one, because it is guaranteed to be 1-to-1 mapping.

inverse_map = {v:k for k,v in map_dict.iteritems()}
df_beda[['source','target']] = df_beda[['source','target']].replace(inverse_map)
print df_beda

   source  target    weight
0       1   70399  0.988375
1       1   33919  0.981573
2       1   62461  0.981427
3     579       1  0.983019
4     745       1  0.995580
5     834       1  0.980943

Upvotes: 2

Related Questions