Reputation: 845
For a current research project, I am planning to read the JSON object "Main_Text"
within a pre-defined time range on basis of Python/Pandas. The code however yields the error TypeError: string indices must be integers
for line line = row["Main_Text"]
.
I have alreay gone through pages addressing the same issue but not found any solution yet. Is there any helpful tweak to make this work?
The JSON file has the following structure:
[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]
And the corresponding code section looks this this:
import string
import json
import csv
import pandas as pd
import datetime
import numpy as np
# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])
# Create an empty dictionary
d = dict()
# Filtering by date
start_date = "01/01/2009"
end_date = "01/01/2015"
after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date
between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]
print(filtered_dates)
# Processing
for row in filtered_dates:
line = row["Text Main"]
# Remove the leading spaces and newline character
line = line.strip()
Upvotes: 0
Views: 3197
Reputation: 1413
If the requirement is to collect all the contents of 'Text Main' column, this is what we can do:
line = list(filtered_dates['Text Main'])
We can then then also apply strip:
line = [val.strip() for val in line]
Upvotes: 1