dataelephant
dataelephant

Reputation: 563

Append Dictionary Elements into an Empty Pandas Dataframe Column

I have data in a pandas dataframe which looks like this:

queryName   Market  tags    categoryDetails
dummy_query (dummy_market)  dummy_market    dummy_tag   [{'name': 'relevant_data', 'parentName': 'relevant_scrape', 'parentId': '289245228', 'id': '2892695401'}, {'name': 'relevant_data', 'parentName': 'relevant_scrape', 'parentId': '289245228', 'id': '21892718'}, {'name': 'dummy_data', 'parentName': 'Location', 'parentId': '21221517840', 'id': '229565351'}]
dummy_query (dummy_market)  dummy_market    dummy_tag   [{'name': 'relevant_data', 'parentName': 'relevant_scrape', 'parentId': '289245228', 'id': '2892659'}, {'name': 'relevant_data', 'parentName': 'relevant_scrape', 'parentId': '289245228', 'id': '2892667'}, {'name': 'irrelevant_data', 'parentName': 'irrelevant_scrape', 'parentId': '2662610', 'id': '268415777'}, {'name': 'dummy_data', 'parentName': 'Location', 'parentId': '21221517840', 'id': '2565351'}]
dummy_query (dummy_market)  dummy_market    dummy_tag   [{'name': 'relevant_data', 'parentName': 'relevant_scrape', 'parentId': '289245228', 'id': '2892695401'}, {'name': 'irrelevant_data', 'parentName': 'irrelevant_scrape', 'parentId': '2662610', 'id': '268415777'}, {'name': 'dummy_data', 'parentName': 'Location', 'parentId': '21221517840', 'id': '229565351'}, {'name': 'Consideration', 'parentName': 'irrelevant_scrape', 'parentId': '2203873', 'id': '2203874'}]
dummy_query (dummy_market)  dummy_market    dummy_tag   [{'name': 'relevant_data', 'parentName': 'relevant_scrape', 'parentId': '289245228', 'id': '2892695401'}, {'name': 'irrelevant_data', 'parentName': 'irrelevant_scrape', 'parentId': '2662610', 'id': '268415777'}, {'name': 'dummy_data', 'parentName': 'Location', 'parentId': '21221517840', 'id': '229565351'}]
dummy_query (dummy_market)  dummy_market    dummy_tag   [{'name': 'relevant_data', 'parentName': 'relevant_scrape', 'parentId': '289245228', 'id': '21892718'}, {'name': 'irrelevant_data', 'parentName': 'irrelevant_scrape', 'parentId': '2662610', 'id': '268415777'}, {'name': 'dummy_data', 'parentName': 'Location', 'parentId': '21221517840', 'id': '229565351'}]
dummy_query (dummy_market)  dummy_market    dummy_tag   [{'name': 'relevant_data', 'parentName': 'relevant_scrape', 'parentId': '289245228', 'id': '2892659'}, {'name': 'dummy_data', 'parentName': 'Location', 'parentId': '21221517840', 'id': '229565351'}, {'name': 'dummy_data', 'parentName': 'irrelevant_scrape', 'parentId': '2203873', 'id': '2203880'}]

I need my dataframe to have an additional, fifth column which will contain all the name keys with the elements called "relevant_data" of each row. These datapoints are selected based off of the parentName. If parentName = 'relevant_scrape', select "name."

How should I go about doing this? Here is my code so far.

import pandas as pd
import json
from pandas import DataFrame, read_csv

df = pd.read_csv('dataset.csv', sep = '\t')
for row in df.categoryDetails:
    if isinstance(row, str):
        list_dicts = json.loads(row.replace("'", "\""))
        for each_dict in list_dicts:
            if each_dict["parentName"] == "relevant_scrape":
                df['fifth_column'] = each_dict["name"]

df.to_csv('output.txt', sep = '\t')

(Note: my original data is a bit messy and couldn't be rendered as JSON until I replaced its quotation marks with double quotation marks. Hence the json.loads call.)

This produces for me a dataframe with a fifth column, but it inserts the same exact "name" element in each row. Any and all help is appreciated, thank you.

Upvotes: 2

Views: 1373

Answers (1)

OmerBA
OmerBA

Reputation: 842

You are using df['fifth_column'] = each_dict["name"], which sets all values in the 'fifth_column' column to the same value each iteration, since pandas' operations are column wise by default.

Maybe you should try the following snippet:

def extract_details(row):
    # your parsing logic.
    if isinstance(row, str):
        list_dicts = json.loads(row.replace("'", "\""))
        all_relevant_data = []
        for each_dict in list_dicts:
            if each_dict["parentName"] == "relevant_scrape":
                all_relevant_data.append(each_dict["name"])
        return ','.join(all_relevant_data)

and then you could do:

df['fifth_column'] = df.categoryDetails.apply(extract_details)

Upvotes: 1

Related Questions