Reputation: 95
hey i have the code like this for consume kafka data
bootstrap_servers = ['localhost:9092']
topicName = 'testapp5'
consumer = KafkaConsumer (topicName, group_id ='group1',bootstrap_servers = bootstrap_servers)
for msg in consumer:
print("Topic Name=%s,Message=%s"%(msg.topic,msg.value))
and then i want to load the data with
message = json.loads(msg.value)
the output:
{'request_id': 'f84c55fd-c730-49ba-83b2-47b04643b706',
'data': {'age': 24,
'workclass': 'Self-emp-not-inc',
'fnlwgt': 188274,
'education': 'Bachelors',
'marital_status': 'Never-married',
'occupation': 'Sales',
'relationship': 'Not-in-family',
'race': 'White',
'gender': 'Male',
'capital_gain': 0,
'capital_loss': 0,
'hours_per_week': 50,
'native_country': 'United-States',
'income_bracket': '<=50K.'}}
and then i want to change the data to pandas dataframe with
row = pd.DataFrame(message, index=[0])
and the output:
what should i do to make json from kafka can access with pandas dataframe? thanks before
Upvotes: 0
Views: 608
Reputation: 31166
This simplest approach is to use json_normalize. If you just want data you can use pd.DataFrame
using dict key.
js = {'request_id': 'f84c55fd-c730-49ba-83b2-47b04643b706',
'data': {'age': 24,
'workclass': 'Self-emp-not-inc',
'fnlwgt': 188274,
'education': 'Bachelors',
'marital_status': 'Never-married',
'occupation': 'Sales',
'relationship': 'Not-in-family',
'race': 'White',
'gender': 'Male',
'capital_gain': 0,
'capital_loss': 0,
'hours_per_week': 50,
'native_country': 'United-States',
'income_bracket': '<=50K.'}}
# simplest....
pd.json_normalize(js)
# if requestid is not needed
pd.DataFrame(js["data"], index=[0])
Upvotes: 1