Extracting data from JSON File to CSV

Question

I have a big JSON file with a very complex structure

you can look on it here: https://drive.google.com/file/d/1tBVJ2xYSCpTTUGPJegvAz2ZXbeN0bteX/view?usp=sharing

it contains more than 7 millions lines, and I want to extract only the "text" field

I have written a python code, to extra all the values of the "text" key or field in the whole file, and it extracted only 12 values! while when I open the JSON file on the Visualstudio, I have more than 19000 values!!

you can see the code here:

import json
import csv
with open("/Users/zahraa-maher/rasa-init-demo/venv/Tickie/external_data/frames2.json") as file:
    data = json.load(file)

fname = "outputText8.csv"

with open(fname, "w") as file:
    csv_file = csv.writer(file,lineterminator='
')
    csv_file.writerow(["text"])
    for item in data[i]["turns"]:
        csv_file.writerow([item['text']])

please take a look on the JSON file as it is very large one and with a complex structure, so I an not paste it here to see because it would be not understandable

also this is a part of the son file:

[
    {
        "user_id": "U22HTHYNP",
        "turns": [
            {
                "text": "I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.",
                "labels": {
                    "acts": [
                        {
                            "args": [
                                {
                                    "val": "book",
                                    "key": "intent"
                                }
                            ],
                            "name": "inform"
                        },
                        {
                            "args": [
                                {
                                    "val": "Atlantis",
                                    "key": "dst_city"
                                },
                                {
                                    "val": "Caprica",
                                    "key": "or_city"
                                },
                                {
                                    "val": "Saturday, August 13, 2016",
                                    "key": "str_date"
                                },
                                {
                                    "val": "8",
                                    "key": "n_adults"
                                },
                                {
                                    "val": "1700",
                                    "key": "budget"
                                }
                            ],
                            "name": "inform"
                        }
                    ],
                    "acts_without_refs": [
                        {
                            "args": [
                                {
                                    "val": "book",
                                    "key": "intent"
                                }
                            ],
                            "name": "inform"
                        },
                        {
                            "args": [
                                {
                                    "val": "Atlantis",
                                    "key": "dst_city"
                                },
                                {
                                    "val": "Caprica",
                                    "key": "or_city"
                                },
                                {
                                    "val": "Saturday, August 13, 2016",
                                    "key": "str_date"
                                },
                                {
                                    "val": "8",
                                    "key": "n_adults"
                                },
                                {
                                    "val": "1700",
                                    "key": "budget"
                                }
                            ],
                            "name": "inform"
                        }
                    ],
                    "active_frame": 1,
                    "frames": [
                        {
                            "info": {
                                "intent": [
                                    {
                                        "val": "book",
                                        "negated": false
                                    }
                                ],
                                "budget": [
                                    {
                                        "val": "1700.0",
                                        "negated": false
                                    }
                                ],
                                "dst_city": [
                                    {
                                        "val": "Atlantis",
                                        "negated": false
                                    }
                                ],
                                "or_city": [
                                    {
                                        "val": "Caprica",
                                        "negated": false
                                    }
                                ],
                                "str_date": [
                                    {
                                        "val": "august 13",
                                        "negated": false
                                    }
                                ],
                                "n_adults": [
                                    {
                                        "val": "8",
                                        "negated": false
                                    }
                                ]
                            },
                            "frame_id": 1,
                            "requests": [],
                            "frame_parent_id": null,
                            "binary_questions": [],
                            "compare_requests": []
                        }
                    ]
                },
                "author": "user",
                "timestamp": 1471272019730.0
            },
            {
                "db": {
                    "result": [
                        [
                            {
                                "trip": {
                                    "returning": {
                                        "duration": {
                                            "hours": 0,
                                            "min": 51
                                        },
                                        "arrival": {
                                            "hour": 10,
                                            "year": 2016,
                                            "day": 24,
                                            "min": 51,
                                            "month": 8
                                        },
                                        "departure": {
                                            "hour": 10,
                                            "year": 2016,
                                            "day": 24,
                                            "min": 0,
                                            "month": 8
                                        }
                                    },
                                    "seat": "ECONOMY",
                                    "leaving": {
                                        "duration": {
                                            "hours": 0,
                                            "min": 51
                                        },
                                        "arrival": {
                                            "hour": 0,
                                            "year": 2016,
                                            "day": 16,
                                            "min": 51,
                                            "month": 8
                                        },
                                        "departure": {
                                            "hour": 0,
                                            "year": 2016,
                                            "day": 16,
                                            "min": 0,
                                            "month": 8
                                        }
                                    },
                                    "or_city": "Porto Alegre",
                                    "duration_days": 9
                                },
                                "price": 2118.81,
                                "hotel": {
                                    "gst_rating": 7.15,
                                    "vicinity": [],
                                    "name": "Scarlet Palms Resort",
                                    "country": "Brazil",
                                    "amenities": [
                                        "FREE_BREAKFAST",
                                        "FREE_PARKING",
                                        "FREE_WIFI"
                                    ],
                                    "dst_city": "Goiania",
                                    "category": "3.5 star hotel"
                                }
                            },
                            {
                                "trip": {
                                    "returning": {
                                        "duration": {
                                            "hours": 2,
                                            "min": 37
                                        },
                                        "arrival": {
                                            "hour": 12,
                                            "year": 2016,
                                            "day": 10,
                                            "min": 37,
                                            "month": 8
                                        },
                                        "departure": {
                                            "hour": 10,
                                            "year": 2016,
                                            "day": 10,
                                            "min": 0,
                                            "month": 8
                                        }
                                    },
                                    "seat": "ECONOMY",
                                    "leaving": {
                                        "duration": {
                                            "hours": 2,
                                            "min": 37
                                        },
                                        "arrival": {
                                            "hour": 0,
                                            "year": 2016,
                                            "day": 4,
                                            "min": 37,
                                            "month": 8
                                        },
                                        "departure": {
                                            "hour": 22,
                                            "year": 2016,
                                            "day": 3,
                                            "min": 0,
                                            "month": 8
                                        }
                                    },
                                    "or_city": "Porto Alegre",
                                    "duration_days": 7
                                },
                                "price": 2369.83,
                                "hotel": {
                                    "gst_rating": 0,
                                    "vicinity": [],
                                    "name": "Sunway Hostel",
                                    "country": "Argentina",
                                    "amenities": [
                                        "FREE_BREAKFAST",
                                        "FREE_WIFI"
                                    ],
                                    "dst_city": "Rosario",
                                    "category": "2.0 star hotel"
                                }
                            },
                            {
                                "trip": {
                                    "returning": {
                                        "duration": {
                                            "hours": 0,
                                            "min": 51
                                        },
                                        "arrival": {
                                            "hour": 10,
                                            "year": 2016,
                                            "day": 24,
                                            "min": 51,
                                            "month": 8
                                        },
                                        "departure": {
                                            "hour": 10,
                                            "year": 2016,
                                            "day": 24,
                                            "min": 0,
                                            "month": 8
                                        }
                                    },
                                    "seat": "BUSINESS",
                                    "leaving": {
                                        "duration": {
                                            "hours": 0,
                                            "min": 51
                                        },
                                        "arrival": {
                                            "hour": 0,
                                            "year": 2016,
                                            "day": 16,
                                            "min": 51,
                                            "month": 8
                                        },
                                        "departure": {
                                            "hour": 0,
                                            "year": 2016,
                                            "day": 16,
                                            "min": 0,
                                            "month": 8
                                        }
                                    },
                                    "or_city": "Porto Alegre",
                                    "duration_days": 9
                                },
                                "price": 2375.72,
                                "hotel": {
                                    "gst_rating": 7.15,
                                    "vicinity": [],
                                    "name": "Scarlet Palms Resort",
                                    "country": "Brazil",
                                    "amenities": [
                                        "FREE_BREAKFAST",
                                        "FREE_PARKING",
                                        "FREE_WIFI"
                                    ],
                                    "dst_city": "Goiania",
                                    "category": "3.5 star hotel"
                                }
                            },
                            {
                                "trip": {
                                    "returning": {
                                        "duration": {
                                            "hours": 1,
                                            "min": 30
                                        },
                                        "arrival": {
                                            "hour": 11,
                                            "year": 2016,
                                            "day": 1,
                                            "min": 30,
                                            "month": 9
                                        },
                                        "departure": {
                                            "hour": 10,
                                            "year": 2016,
                                            "day": 1,
                                            "min": 0,
                                            "month": 9
                                        }
                                    },
                                    "seat": "BUSINESS",
                                    "leaving": {
                                        "duration": {
                                            "hours": 1,
                                            "min": 30
                                        },
                                        "arrival": {
                                            "hour": 18,
                                            "year": 2016,
                                            "day": 19,
                                            "min": 30,
                                            "month": 8
                                        },
                                        "departure": {
                                            "hour": 17,
                                            "year": 2016,
                                            "day": 19,
                                            "min": 0,
                                            "month": 8
                                        }
                                    },
                                    "or_city": "Porto Alegre",
                                    "duration_days": 13
                                },
                                "price": 2492.95,
                                "hotel": {
                                    "gst_rating": 0,
                                    "vicinity": [],
                                    "name": "Hotel Mundo",
                                    "country": "Brazil",
                                    "amenities": [
                                        "FREE_BREAKFAST",
                                        "FREE_WIFI",
                                        "FREE_PARKING"
                                    ],
                                    "dst_city": "Manaus",
                                    "category": "2.5 star hotel"
                                }
                            },
                            {
                                "trip": {
                                    "returning": {
                                        "duration": {
                                            "hours": 0,
                                            "min": 51
                                        },
                                        "arrival": {
                                            "hour": 10,
                                            "year": 2016,
                                            "day": 31,
                                            "min": 51,
                                            "month": 8
                                        },
                                        "departure": {
                                            "hour": 10,
                                            "year": 2016,
                                            "day": 31,
                                            "min": 0,
                                            "month": 8
                                        }
                                    },
                                    "seat": "ECONOMY",
                                    "leaving": {
                                        "duration": {
                                            "hours": 0,
                                            "min": 51
                                        },
                                        "arrival": {
                                            "hour": 19,
                                            "year": 2016,
                                            "day": 27,
                                            "min": 51,
                                            "month": 8
                                        },
                                        "departure": {
                                            "hour": 19,
                                            "year": 2016,
                                            "day": 27,
                                            "min": 0,
                                            "month": 8
                                        }
                                    },
                                    "or_city": "Porto Alegre",
                                    "duration_days": 4
                                },
                                "price": 2538.0,
                                "hotel": {
                                    "gst_rating": 8.22,
                                    "vicinity": [],
                                    "name": "The Glee",
                                    "country": "Brazil",
                                    "amenities": [
                                        "FREE_BREAKFAST",
                                        "FREE_WIFI"
                                    ],
                                    "dst_city": "Recife",
                                    "category": "4.0 star hotel"
                                }
                            }
                        ],
                        [],
                        [],
                        [],
                        [],
                        [],
                        []
                    ],
                    "search": [
                        {
                            "ORIGIN_CITY": "Porto Alegre",
                            "PRICE_MIN": "2000",
                            "NUM_ADULTS": "2",
                            "timestamp": 1471271949.995,
                            "PRICE_MAX": "3000",
                            "ARE_DATES_FLEXIBLE": "true",
                            "NUM_CHILDREN": "5",
                            "START_TIME": "1470110400000",
                            "MAX_DURATION": 2592000000.0,
                            "DESTINATION_CITY": "Brazil",
                            "RESULT_LIMIT": "10",
                            "END_TIME": "1472616000000"
                        },
                        {
                            "ORIGIN_CITY": "Atlantis",
                            "NUM_ADULTS": "8",
                            "RESULT_LIMIT": "10",
                            "timestamp": 1471272148.124,
                            "PRICE_MAX": "1700",
                            "NUM_CHILDREN": "",
                            "ARE_DATES_FLEXIBLE": "true",
                            "START_TIME": "NaN",
                            "END_TIME": "NaN"
                        },
                        {
                            "ORIGIN_CITY": "Caprica",
                            "PRICE_MAX": "1700",
                            "NUM_ADULTS": "8",
                            "RESULT_LIMIT": "10",
                            "timestamp": 1471272189.07,
                            "DESTINATION_CITY": "Atlantis",
                            "NUM_CHILDREN": "",
                            "ARE_DATES_FLEXIBLE": "true",
                            "START_TIME": "1470715200000",
                            "END_TIME": "1472011200000"
                        },
                        {
                            "ORIGIN_CITY": "Caprica",
                            "PRICE_MAX": "1700",
                            "NUM_ADULTS": "8",
                            "RESULT_LIMIT": "10",
                            "timestamp": 1471272205.436,
                            "DESTINATION_CITY": "Atlantis",
                            "NUM_CHILDREN": "",
                            "ARE_DATES_FLEXIBLE": "true",
                            "START_TIME": "1470715200000",
                            "END_TIME": "1472011200000"
                        },
                        {
                            "ORIGIN_CITY": "Caprica",
                            "PRICE_MIN": "1700",
                            "NUM_ADULTS": "8",
                            "RESULT_LIMIT": "10",
                            "timestamp": 1471272278.72,
                            "DESTINATION_CITY": "Atlantis",
                            "NUM_CHILDREN": "",
                            "ARE_DATES_FLEXIBLE": "true",
                            "START_TIME": "1470715200000",
                            "END_TIME": "1472011200000"
                        },
                        {
                            "ORIGIN_CITY": "Caprica",
                            "PRICE_MIN": "1700",
                            "NUM_ADULTS": "8",
                            "RESULT_LIMIT": "10",
                            "timestamp": 1471272454.542,
                            "DESTINATION_CITY": "Atlantis",
                            "NUM_CHILDREN": "",
                            "ARE_DATES_FLEXIBLE": "true",
                            "START_TIME": "1471060800000",
                            "END_TIME": "1472011200000"
                        },
                        {
                            "ORIGIN_CITY": "Caprica",
                            "PRICE_MIN": "1700",
                            "NUM_ADULTS": "8",
                            "RESULT_LIMIT": "10",
                            "timestamp": 1471272466.008,
                            "DESTINATION_CITY": "Atlantis",
                            "NUM_CHILDREN": "",
                            "ARE_DATES_FLEXIBLE": "true",
                            "START_TIME": "1471060800000",
                            "END_TIME": "1472011200000"
                        }
                    ]
                },

How it could be modified to extract all the "text" values from the JSON file to a CSV file?

Walid · Accepted Answer

This is a potential solution using pandas:

import pandas as pd
#importing data
dj = pd.read_json("frames2.json")
dtext = dj[["user_id","turns"]]
#Saving text records in a list
list_ = []
for record in dtext["turns"].values:
  for r  in record:
    list_.append(r["text"])
#Exporting the csv
out = pd.Series(list_,name="text")
out.to_csv("text.csv")

It gives the following output.

Extracting data from JSON File to CSV

Answers (2)

Related Questions