Manasa Kota
Manasa Kota

Reputation: 9

How to extract JSON list column from JSON file and convert it into dataframe

I have a JSON file that consists of 1m of data. I wanted to extract the skills based on id so that each id has different skills. Can anyone suggest to me how to extract the skills column from the JSON file and convert it into a data frame? I wanted only skill among three columns present in the skills list.

I am attaching a few rows from the JSON file.

{
  "id": "3d86309e-64f6-4df8-ba60-cce431870bfb",
  "location": {
    "city": "Assen",
    "country": "Netherlands",
    "longitude": 6.564228534698486,
    "latitude": 52.99275207519531
  },
  "educations": [
    {
      "title": "Bachelor Bedrijfskundige Informatica",
      "institution": "Hanzehogeschool Groningen / Hanze University of Applied Sciences Groningen",
      "start_date": "2002-01-01",
      "end_date": "2005-12-31",
      "ongoing": false,
      "edu_type_id": 16870377,
      "edu_cat_id": 951006,
      "level": 11
    },
    {
      "title": "Bachelor Public Relations, Marketing, Communication",
      "institution": "NHL Hogeschool",
      "start_date": "1994-01-01",
      "end_date": "1997-12-31",
      "ongoing": false,
      "edu_type_id": 953096,
      "edu_cat_id": 951099,
      "level": 11
    },
    {
      "title": " ",
      "institution": "Gomarus College",
      "start_date": null,
      "end_date": null,
      "ongoing": false,
      "edu_type_id": null,
      "edu_cat_id": null,
      "level": null
    }
  ],
  "work_experiences": [
    {
      "title": "medewerker ICT business development",
      "company_name": "Woningcorporatie Actium",
      "location": {
        "city": null,
        "country": null,
        "longitude": null,
        "latitude": null
      },
      "start_date": "2014-10-01",
      "end_date": null,
      "classification": {
        "function_type": "Sales, business development and key account managers",
        "function_type_id": 1568086,
        "function_cat": "Sales, account and business development managers and representatives",
        "function_cat_id": 1567386,
        "level": 5
      }
    },
    {
      "title": "Functioneel Applicatiebeheerder",
      "company_name": "Actium Assen",
      "location": {
        "city": null,
        "country": null,
        "longitude": null,
        "latitude": null
      },
      "start_date": "2013-02-01",
      "end_date": "2014-09-30",
      "classification": {
        "function_type": "Ict service and information managers",
        "function_type_id": 1567953,
        "function_cat": "Ict service and information managers",
        "function_cat_id": 1567269,
        "level": 5
      }
    },
    {
      "title": "Change Coördinator",
      "company_name": "KPN Consulting",
      "location": {
        "city": null,
        "country": null,
        "longitude": null,
        "latitude": null
      },
      "start_date": "2006-08-01",
      "end_date": "2012-09-30",
      "classification": {
        "function_type": "Securities and finance dealers and brokers",
        "function_type_id": 1567651,
        "function_cat": "Finance, securities and investment staff",
        "function_cat_id": 1567434,
        "level": 6
      }
    },
    {
      "title": "Coordinator Automatisering",
      "company_name": "Spinder Products",
      "location": {
        "city": null,
        "country": null,
        "longitude": null,
        "latitude": null
      },
      "start_date": "2000-01-01",
      "end_date": "2006-12-31",
      "classification": {
        "function_type": "Ict service and information managers",
        "function_type_id": 1567953,
        "function_cat": "Ict service and information managers",
        "function_cat_id": 1567269,
        "level": 6
      }
    }
  ],
  "skills": [
    {
      "skill": "business development",
      "skill_id": 972528,
      "skill_type_id": 34097811
    },
    {
      "skill": "Automatisering",
      "skill_id": 1588585,
      "skill_type_id": 954000
    }
  ],
  "languages": [
    {
      "language": "Dutch",
      "proficiency": "native or bilingual proficiency"
    },
    {
      "language": "English",
      "proficiency": "professional working proficiency"
    },
    {
      "language": "German",
      "proficiency": "elementary proficiency"
    }
  ],
  "certificates": [],
  "working_years": 20
}

I want my output to be in the format :

skill
business development,Automatisering

Upvotes: 0

Views: 1520

Answers (1)

Paperpotato
Paperpotato

Reputation: 11

I take it from your question that you only want the names of each skill in the data frame.

The following code will get that if your JSON is in the file "data.json"

from pandas import DataFrame
import json

with open('data.json') as file:
    data = DataFrame([skill["skill"] for skill in json.loads(file.read())["skills"]])
print(data)

will print the following from the DataFrame "data"

                     0
0  business development
1        Automatisering

The 0th column in the data frame is the "skill" column from your JSON, but if you wanted a different column, such as "skill_id" just replace skill["skill"] with skill["skill_id"] in the code above.

If you just want all three columns and don't want to filter any out, the code is even shorter

from pandas import DataFrame
import json

with open('data.json') as file:
    data = DataFrame(json.loads(file.read())["skills"])
print(data)

I'm not sure why you need a DataFrame rather than just a normal list, especially considering you are just getting a list of string.

In the case that you just want a list of the names of the skills you can run

with open('data.json') as file:
    data = [skill["skill"] for skill in json.loads(file.read())["skills"]]
print(data)

I just removed the parts relating to the DataFrame.

JSON.loads takes in the file after you've opened it.

Upvotes: 1

Related Questions