The Great
The Great

Reputation: 7703

pandas link column using list item

I have two dataframes df and tf like as given below

df = [{"unique_key": 1, "test_ids": "1.0,15,2.0,nan"}, {"unique_key": 2, "test_ids": "51,75.0,11.0,NaN"},{"unique_key": 3, "test_ids":np.nan},
     {"unique_key": 4, "test_ids":np.nan}]
df = pd.DataFrame(df)

test_ids,status,revenue,cnt_days     
1,passed,234.54,3          
2,passed,543.21,5
11,failed,21.3,4
15,failed,2098.21,6             
51,passed,232,21     
75,failed,123.87,32 

tf = pd.read_clipboard(sep=',')

I would like to link the unique_key column from df to the tf dataframe

For ex: I will show my output below (that's easy to understand than text)

I was trying something like below

for b in df.test_ids.tolist():
    for a in b.split(','):
        if a >= 0: # to exclude NA values from checking
            for i in len(test_ids):
              if int(a)  == tf['test_ids'][i]:
                   tf['unique_key'] = df['unique_key']
                 

But this is neither efficient nor elegant to solve my problem.

Is there any other better way to achieve the expected output shown below?

enter image description here

Upvotes: 1

Views: 40

Answers (1)

jezrael
jezrael

Reputation: 862611

You can create Series with remove duplicates and missing values, swap to dictioanry and for new first column use DataFrame.insert with Series.map:

s = (df.set_index('unique_key')['test_ids']
       .str.split(',')
       .explode()
       .astype(float)
       .dropna()
       .astype(int)
       .drop_duplicates()
d = {v: k for k, v in s.items()}
print (d)
{1: 1, 15: 1, 2: 1, 51: 2, 75: 2, 11: 2}

tf.insert(0, 'unique_key', tf['test_ids'].map(d))
print (tf)
   unique_key  test_ids  status  revenue  cnt_days
0           1         1  passed   234.54         3
1           1         2  passed   543.21         5
2           2        11  failed    21.30         4
3           1        15  failed  2098.21         6
4           2        51  passed   232.00        21
5           2        75  failed   123.87        32

Another idea is working with DataFrame and create Series for mapping:

s = (df.assign(new = df['test_ids'].str.split(','))
       .explode('new')
       .astype({'new':float})
       .dropna(subset=['new'])
       .astype({'new':int})
       .drop_duplicates(subset=['new'])
       .set_index('new')['unique_key'])

print (s)
new
1     1
15    1
2     1
51    2
75    2
11    2
Name: unique_key, dtype: int64

tf.insert(0, 'unique_key', tf['test_ids'].map(s))
print (tf)
   unique_key  test_ids  status  revenue  cnt_days
0           1         1  passed   234.54         3
1           1         2  passed   543.21         5
2           2        11  failed    21.30         4
3           1        15  failed  2098.21         6
4           2        51  passed   232.00        21
5           2        75  failed   123.87        32

Upvotes: 1

Related Questions