Natalia Szczepanek
Natalia Szczepanek

Reputation: 9

How to load json file with multiple fields to dataframe based on few repetitive values in python?

Is there an easy way to load json file with the following structure:

('ID_1', 'col1_1', 'col2_1' col3_1', 'key1', 'value1', 'col6_1')
('ID_1', 'col1_1', 'col2_1' col3_1', 'key2', 'value2', 'col6_1')
('ID_1', 'col1_1', 'col2_1' col3_1', 'key3', 'value3', 'col6_1')
('ID_2', 'col1_2', 'col2_2' col3_2', 'key1', 'value1', 'col6_2')
('ID_2', 'col1_2', 'col2_2' col3_2', 'key2', 'value2', 'col6_2')
('ID_2', 'col1_2', 'col2_2' col3_2', 'key3', 'value3', 'col6_2')

to achieve:

('ID_1', 'col1_1', 'col2_1' col3_1', 'key1', 'key2', 'key3', col6_1')
('ID_2', 'col1_2', 'col2_2' col3_2', 'key1', 'key2', 'key3', col6_2')

and value1, value2, value3 assigned to key1, key2, key3 accordingly?

I would like to use pandas or pyspark functions.

Upvotes: 0

Views: 48

Answers (1)

Nineteenn
Nineteenn

Reputation: 31

This file structure is an invalid JSON file but you can use DataFrame.drop_duplicates() to drop duplicates:

import pandas as pd


df = pd.DataFrame({
    'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5]
})

df.drop_duplicates(subset=['brand'], keep='first', inplace=True, ignore_index=True)
print(df)

API Reference

Upvotes: 0

Related Questions