Replacing strings in Pandas DataFrame column with array of entries based on dict

Question

I have a DataFrame such as:

     tag1   other
0    a,c      foo
1    b,c      foo
2    d        foo
3    a,a      foo

Of which the entries are strings delimited by commas.

And a dict of definitions for each tag such as:

dict = {'a' : 'Apple',
'b' : 'Banana',
'c' : 'Carrot'}

I would like to replace the definitions of a, b, and c but delete rows in which there is something not within that dict (i.e. d). Furthermore, I'd like to ensure there are no duplicates, such as row index 3 in the example dataset.

What I have so far:

df.tags = df.tags.str.split(',')
for index, row in df.iterrows():
    names = []
    for tag in row.tag1:
            if tag == dict[tag]:
                names.append(dict[tag])
            else:
                 df.drop(df.index[index])

From there I would replace the original column with the values in names. To replace duplicates, I am thinking of iterating over the array and checking if the next value matches the next, and if so, deleting it. However, this is not working and I am a bit stumped. The desired output would look like (with strings in unicode):

     tag1                     other
0    ['Apple', 'Carrot']      foo
1    ['Banadn', 'Carrot']     foo
3    ['Apple']                foo

Replacing strings in Pandas DataFrame column with array of entries based on dict

Answers (1)

Related Questions