Reputation: 435
I am working with a Pandas DataFrame that has a column of entries in arrays, such as the following example:
user_id tags
0 1 [a,b,c]
1 2 [a,b,d]
2 3 [b,c]
...
n n [a,d]
I have some tag ids that correlate to the simplified tags in a JSON object and am trying to replace the entries with their non-simplified variants with the following method:
for user_tags in dataset['tags']:
for tag in user_tags:
for full_tag in UUIDtags['tags_full']:
if full_tag['id'] == tag:
tag = entry['name']
id
and name
are corresponding simplified tags and full tag names in the JSON object.
However, this does not change the value upon execution; is there a Pandas method that I am missing to replace these values? I am afraid that I will replace the entire array rather than replace the individual entries.
Thank you!
EDIT: An example of what the JSON object (UUIDtags
) contains.
{
"tags_full": [{
"id": "a",
"name": "Alpha"
}, {
"id": "b",
"name": "Beta"
....
Upvotes: 1
Views: 1107
Reputation: 2927
Create sample data.
>>> df = pd.DataFrame({'tags':[list(['a', 'b', 'c']),
list(['a', 'b', 'd']), list(['b', 'c'])], 'user_id': [i for i in range(1,4)]})
>>> df
tags user_id
0 [a, b, c] 1
1 [a, b, d] 2
2 [b, c] 3
Generate a replacement dictionary with letters as the keys and full tag as values.
>>> replace_dict = {'a': 'Alpha', 'b': 'Beta', 'Charlie': 'c', 'Delta': 'd'}
Okay, back to the solution...do the iterations over rows and letters in each row replacing using the corresponding values in replacement_dict
.
>>> for row in range(len(df)):
... for tag in range(len(df.loc[row, 'tags'])):
... df.loc[row, 'tags'][tag] = replace_dict[df.loc[row, 'tags'][tag]]
...
Here is the result.
>>> df
tags user_id
0 [Alpha, Beta, Charlie] 1
1 [Alpha, Beta, Delta] 2
2 [Beta, Charlie] 3
Side note:
The creation of replacement_dict
was a rather ad hoc creation of a replacement dictionary based on the letters that appears in my sample data. For you to generate such a replacement dictionary for your full data you could do this.
For example, let's suppose UUIDtags
is your full JSON object
>>> UUIDtags = {'tags_full': [{'id':'a', 'name':'Alpha'}, {'id':'b', 'name':'Beta'}]}
We can generate a replacement dict like this
>>> uuidtags_dict = {}
>>> for tag in UUIDtags['tags_full']:
... uuidtags_dict[tag['id']] = tag['name']
...
>>> uuidtags_dict
{'a': 'Alpha', 'b': 'Beta'}
This generation of the replacement dictionary will scale to your entire JSON object based on the sample that you provided in your edit.
Upvotes: 1