Reputation: 4701
I wanted to load the text file borrowed from here, where each line represent a json string like following:
{"overall": 2.0, "verified": true, "reviewTime": "02 4, 2014", "reviewerID": "A1M117A53LEI8", "asin": "7508492919", "reviewerName": "Sharon Williams", "reviewText": "DON'T CARE FOR IT. GAVE IT AS A GIFT AND THEY WERE OKAY WITH IT. JUST NOT WHAT I EXPECTED.", "summary": "CASE", "unixReviewTime": 1391472000}
I would like to extract only reviewText
and overall
feature from the dataset using tensorflow but facing following error.
AttributeError: in user code:
<ipython-input-4-419019a35c5e>:9 None *
line_dataset = line_dataset.map(lambda row: transform(row))
<ipython-input-4-419019a35c5e>:2 transform *
str_example = example.numpy().decode("utf-8")
AttributeError: 'Tensor' object has no attribute 'numpy'
My code snippet looks like following:
def transform(example):
str_example = example.numpy().decode("utf-8")
json_example = json.loads(str_example)
overall = json_example.get('overall', None)
text = json_example.get('reviewText', None)
return (overall, text)
line_dataset = tf.data.TextLineDataset(filenames = [file_path])
line_dataset = line_dataset.map(lambda row: transform(row))
for example in line_dataset.take(5):
print(example)
I am using tensorflow 2.3.0.
Upvotes: 2
Views: 5372
Reputation: 957
A bit wordy, but try it like this:
def transform(example):
str_example = example.numpy().decode("utf-8")
json_example = json.loads(str_example)
overall = json_example.get('overall', None)
text = json_example.get('reviewText', None)
return (overall, text)
line_dataset = tf.data.TextLineDataset(filenames = [file_path])
line_dataset = line_dataset.map(
lambda input:
tf.py_function(transform, [input], (tf.float32, tf.string))
)
for example in line_dataset.take(5):
print(example)
This particular snippet works for any python function, not only the for numpy functions. So, if you need functions like print
, input
and so on, you can use this. You don't have to know all the details, but if you are interested, please ask me. :)
Upvotes: 1
Reputation: 59741
The input pipeline of a dataset is always traced into a graph (as if you used @tf.function
) to make it faster, which means, among other things, that you cannot use .numpy()
. You can however use tf.numpy_function
to access the data as a NumPy array within the graph:
def transform(example):
# example will now by a NumPy array
str_example = example.decode("utf-8")
json_example = json.loads(str_example)
overall = json_example.get('overall', None)
text = json_example.get('reviewText', None)
return (overall, text)
line_dataset = tf.data.TextLineDataset(filenames = [file_path])
line_dataset = line_dataset.map(
lambda row: tf.numpy_function(transform, row, (tf.float32, tf.string)))
for example in line_dataset.take(5):
print(example)
Upvotes: 3