How to decode only when it is necessary in python

Question

I have a mixed dataset where some of them are as strings and some as bytes as follows.

mydata={'data mining': [b'data', b'text mining', b"artificial intelligence"], 'neural networks': ['cnn', 'rnn', "artificial intelligence"]}

My code is as follows

for key, value in mydata.items():
    for item in value:
        print(type(item))

Since some of the values are bytes I wanted to convert them to strings. Therefore, I did the following change to the above code.

for key, value in mydata.items():
    for item in value:
        print(type(item.decode("utf-8")))

However, then I get an error saying; AttributeError: 'str' object has no attribute 'decode'

I also tried:

for key, value in mydata.items():
    for item in value:
        if type(item) == 'str':
            print(type(item))

But it did not work for me.

Is there a way to resolve this issue?

benvc · Accepted Answer

Following is an implementation of the various suggestions in the comments. Check if the list element is a bytes object and decode if so (since bytes objects are immutable, I am replacing the list element with a decoded version).

mydata = {'data mining': [b'data', b'text mining', b'artificial intelligence'], 'neural networks': ['cnn', 'rnn', "artificial intelligence"]}

for items in mydata.values():
    for i, item in enumerate(items):
        if isinstance(item, bytes):
            items[i] = item.decode()

print(mydata)
# OUTPUT
# {'data mining': ['data', 'text mining', 'artificial intelligence'], 'neural networks': ['cnn', 'rnn', 'artificial intelligence']}

How to decode only when it is necessary in python

Answers (1)

Related Questions