Chirag Sharma
Chirag Sharma

Reputation: 77

How to print a particular column from an Avro file using python

I have a following code which prints all the values from the avro file. However, I want to print a specific column For example:

{'key1': value1 , 'key2': value2} 

I want to print all values of 'key1' present in the avro.

Here is my code

from avro.datafile import DataFileReader
from avro.io import DatumReader
reader = DataFileReader(open("abc.avro", "rb"), DatumReader())
for user in reader:
    print(user)

reader.close()

I am new to Avro and big data stuff

Edit:

Here is the corrected code. Thanks to @Rithin

for user in reader:
print(user['key1'])

This will return all the values corresponding to 'key1'

Upvotes: 0

Views: 1072

Answers (1)

Rithin Chalumuri
Rithin Chalumuri

Reputation: 1839

From the docs:

The DataFileReader is an iterator that returns dicts corresponding to the serialized items.

Since it just returns a list of dictionaries, you can access them using row['key'].

Combining this with list comprehension, will result all values for all rows.

Example:

all_values = [row['key1'] for row in list(reader)]
print(all_values)
[value1]

To save this resulting list to json, you can:

import json

result = {'key1':all_values}

with open('output.json', 'w') as json_file:
  json.dump(result, json_file)

You can read more about saving to json here.


To save this resulting list to csv, you can:

import csv

with open('output.csv', 'w') as csv_file:
  writer = csv.writer(csv_file)
  writer.writerows(all_values)

You can read more about working with csv files here.

Upvotes: 1

Related Questions