Reputation: 77
I have a following code which prints all the values from the avro file. However, I want to print a specific column For example:
{'key1': value1 , 'key2': value2}
I want to print all values of 'key1' present in the avro.
Here is my code
from avro.datafile import DataFileReader
from avro.io import DatumReader
reader = DataFileReader(open("abc.avro", "rb"), DatumReader())
for user in reader:
print(user)
reader.close()
I am new to Avro and big data stuff
Edit:
Here is the corrected code. Thanks to @Rithin
for user in reader:
print(user['key1'])
This will return all the values corresponding to 'key1'
Upvotes: 0
Views: 1072
Reputation: 1839
From the docs:
The DataFileReader is an iterator that returns dicts corresponding to the serialized items.
Since it just returns a list of dictionaries, you can access them using row['key']
.
Combining this with list comprehension, will result all values for all rows.
Example:
all_values = [row['key1'] for row in list(reader)]
print(all_values)
[value1]
To save this resulting list to json
, you can:
import json
result = {'key1':all_values}
with open('output.json', 'w') as json_file:
json.dump(result, json_file)
You can read more about saving to json here.
To save this resulting list to csv
, you can:
import csv
with open('output.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
writer.writerows(all_values)
You can read more about working with csv files here.
Upvotes: 1