kalle
kalle

Reputation: 435

Replacing array values in a Pandas DataFrame via iteration

I am working with a Pandas DataFrame that has a column of entries in arrays, such as the following example:

    user_id    tags
0      1       [a,b,c]
1      2       [a,b,d]
2      3       [b,c]
...
n      n       [a,d]

I have some tag ids that correlate to the simplified tags in a JSON object and am trying to replace the entries with their non-simplified variants with the following method:

for user_tags in dataset['tags']:
    for tag in user_tags:
        for full_tag in UUIDtags['tags_full']:
            if full_tag['id'] == tag:
                tag = entry['name']

id and name are corresponding simplified tags and full tag names in the JSON object.

However, this does not change the value upon execution; is there a Pandas method that I am missing to replace these values? I am afraid that I will replace the entire array rather than replace the individual entries.

Thank you!

EDIT: An example of what the JSON object (UUIDtags) contains.

{
    "tags_full": [{
        "id": "a",
        "name": "Alpha"
    }, {
        "id": "b",
        "name": "Beta"
....

Upvotes: 1

Views: 1107

Answers (1)

spies006
spies006

Reputation: 2927

Create sample data.

>>> df = pd.DataFrame({'tags':[list(['a', 'b', 'c']), 
list(['a', 'b', 'd']), list(['b', 'c'])], 'user_id': [i for i in range(1,4)]})

>>> df
        tags  user_id
0  [a, b, c]        1
1  [a, b, d]        2
2     [b, c]        3

Generate a replacement dictionary with letters as the keys and full tag as values.

>>> replace_dict = {'a': 'Alpha', 'b': 'Beta', 'Charlie': 'c', 'Delta': 'd'}

Okay, back to the solution...do the iterations over rows and letters in each row replacing using the corresponding values in replacement_dict.

>>> for row in range(len(df)):
...     for tag in range(len(df.loc[row, 'tags'])):
...             df.loc[row, 'tags'][tag] = replace_dict[df.loc[row, 'tags'][tag]]
... 

Here is the result.

>>> df
                     tags  user_id
0  [Alpha, Beta, Charlie]        1
1    [Alpha, Beta, Delta]        2
2         [Beta, Charlie]        3

Side note: The creation of replacement_dict was a rather ad hoc creation of a replacement dictionary based on the letters that appears in my sample data. For you to generate such a replacement dictionary for your full data you could do this.

For example, let's suppose UUIDtags is your full JSON object

>>> UUIDtags = {'tags_full': [{'id':'a', 'name':'Alpha'}, {'id':'b', 'name':'Beta'}]}

We can generate a replacement dict like this

>>> uuidtags_dict = {}
>>> for tag in UUIDtags['tags_full']:
...     uuidtags_dict[tag['id']] = tag['name']
... 
>>> uuidtags_dict
{'a': 'Alpha', 'b': 'Beta'}

This generation of the replacement dictionary will scale to your entire JSON object based on the sample that you provided in your edit.

Upvotes: 1

Related Questions