Jambo
Jambo

Reputation: 65

Converting JSON to CSV in Python: List indices must be integers, not str

I'm trying to convert a JSON file to CSV using Python, but after a few error messages (and subsequent fixes...I think) I'm now receiving the following error:

TypeError: List indices must be integers, not str

I looked on similar threads and it seems this error is easily fixable, but using the advices on other threads, I still can't get it to work.

The code I'm trying looks like the below (only there are hundreds more rows, I just cleaned the below example up a little)

import csv
import json

x= r"""[
[{"post_header": ["username - 09 Apr 2015 - 19:58:55 - 1 of 6"], "post": ["example message", "\n", "\nexample message"], "post_thread_url": ["http://www.examplewebsite.com/message1"], "post_symbol": ["EG"], "post_title": ["Example Title"]}]
]"""

x = json.loads(x.replace('\n', ''))

f = csv.writer(open("filename.csv", "wb+"))

f.writerow(["post_header", "post", "post_thread_url", "post_symbol", "post_title"])

for x in x:
    f.writerow([x["post_header"],
                x["post"],
                x["post_thread_url"],
                x["post_symbol"],
                x["post_title"]])

Upvotes: 1

Views: 894

Answers (3)

Vivek Sable
Vivek Sable

Reputation: 10223

Our input is list-->list-->dictionary

means we have to Iterate main List and pick first value from the item.

data = r"""[
    [
        {"post_header": ["username - 09 Apr 2015 - 19:58:55 - 1 of 6"], "post": ["example message", "\n", "\nexample message"], "post_thread_url": ["http://www.examplewebsite.com/message1"], "post_symbol": ["EG"], "post_title": ["Example Title"]}
    ]
]"""

Use variable names correctly, do not create same name variable.

Demo:

import csv
import json

data = r"""[
    [
        {"post_header": ["username - 09 Apr 2015 - 19:58:55 - 1 of 6"], "post": ["example message", "\n", "\nexample message"], "post_thread_url": ["http://www.examplewebsite.com/message1"], "post_symbol": ["EG"], "post_title": ["Example Title"]}
    ]
]"""

data_list = json.loads(data.replace('\n', ''))

#- Open file by "with" statement (so no need to close file i.e. fp.close() ) 
with open("filename.csv", "wb+") as fp:  
    # Create CSV file object.
    root = csv.writer(fp)
    #- Write first row in CSV file.
    root.writerow(["post_header", "post", "post_thread_url", "post_symbol", "post_title"])
    #- Iterate every item from the Data list:
    for i in data_list:
        # As Item i is again list, so pick first element from the list which is type dictionary.
        # i  >>>is list
        # i[0] >>> is dictionary 
        #  i[0]["post_header"] >> get value of post_header
        root.writerow([i[0]["post_header"],
                    i[0]["post"],
                    i[0]["post_thread_url"],
                    i[0]["post_symbol"],
                    i[0]["post_title"]])

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121924

Your JSON contains a nested list:

x= r"""[
[{"post_header": ["username - 09 Apr 2015 - 19:58:55 - 1 of 6"], "post": ["example message", "\n", "\nexample message"], "post_thread_url": ["http://www.examplewebsite.com/message1"], "post_symbol": ["EG"], "post_title": ["Example Title"]}]
]"""

There is a [...] pair on the outside, then another such pair around the dictionary.

Either unwrap that outer list from your structure, or loop over it to handle each nested list, if there are more such lists in the nested data.

Next, each value in the innermost dictionary is itself a list with just one element. Try to avoid writing those lists; instead if that is the normal pattern, write the one string element in each list element.

Last, but not least, don't re-use variable names for two different tasks; you should not be using x for the original JSON string, the entire data structure and the loop variable, for example.

This works:

x= r"""[
[{"post_header": ["username - 09 Apr 2015 - 19:58:55 - 1 of 6"], "post": ["example message", "\n", "\nexample message"], "post_thread_url": ["http://www.examplewebsite.com/message1"], "post_symbol": ["EG"], "post_title": ["Example Title"]}]
]"""

fields = ["post_header", "post", "post_thread_url", "post_symbol", "post_title"]

data = json.loads(x)
with open("filename.csv", "wb")) as f:
    writer = csv.DictWriter(f, fieldnames=fields)
    writer.writeheader()

    for row in data:
        for entry in row:
            writer.writerow({key: value[0] for key, value in entry.iteritems()})

Other changes I made:

  • Using a DictWriter is easier here as it saves repeating yourself naming all the keys.
  • No need to remove newlines; JSON accepts extra newlines

Upvotes: 1

martineau
martineau

Reputation: 123463

Your JSON describes a list of a list of dictionaries, not a dictionary, and you can't index lists with strings.

import json

x= r"""[
[{"post_header": ["username - 09 Apr 2015 - 19:58:55 - 1 of 6"], 
  "post": ["example message", "\n", "\nexample message"],
  "post_thread_url": ["http://www.examplewebsite.com/message1"], 
  "post_symbol": ["EG"], "post_title": ["Example Title"]}]
]"""

x = json.loads(x.replace('\n', ''))
print json.dumps(x, indent=2)  # show what was loaded

Output:

[
  [
    {
      "post_header": [
        "username - 09 Apr 2015 - 19:58:55 - 1 of 6"
      ], 
      "post": [
        "example message", 
        "\n", 
        "\nexample message"
      ], 
      "post_thread_url": [
        "http://www.examplewebsite.com/message1"
      ], 
      "post_symbol": [
        "EG"
      ], 
      "post_title": [
        "Example Title"
      ]
    }
  ]
]

Upvotes: 1

Related Questions