Reputation: 43491
for petid in X['PetID']:
sentiment_file = datapath + '/train_sentiment/' + petid + '.json'
if os.path.isfile(sentiment_file):
json_data = json.loads(open(sentiment_file).read())
X['DescriptionLanguage'] = json_data['language']
X['DescriptionMagnitude'] = json_data['documentSentiment']['magnitude']
X['DescriptionScore'] = json_data['documentSentiment']['score']
# print(petid, sentiment_file,
# json_data['documentSentiment']['magnitude'])
else:
X['DescriptionLanguage'] = 'Unknown'
X['DescriptionMagnitude'] = 0
X['DescriptionScore'] = 0
This is what I have, but this doesn't work. It sets EVERY row to have those values for DescriptionLanguage
, DescriptionMagnitude
and DescriptionScore
.
Upvotes: 0
Views: 25
Reputation: 10841
In addition to @Heikki Pulkkinen's excellent answer, you can also index the individual columns in the data frame, e.g.:
import pandas as pd
import numpy as np
data = np.array([np.arange(10)]*4).T
X = pd.DataFrame(data,columns=["PetID","DescriptionLanguage","DescriptionMagnitude","DescriptionScore"])
for i in range(len(X['PetID'])):
X['DescriptionLanguage'][i] = 10*i
... which results in X becoming:
PetID DescriptionLanguage DescriptionMagnitude DescriptionScore
0 0 0 0 0
1 1 10 1 1
2 2 20 2 2
3 3 30 3 3
4 4 40 4 4
5 5 50 5 5
6 6 60 6 6
7 7 70 7 7
8 8 80 8 8
9 9 90 9 9
Upvotes: 2
Reputation: 277
You can use .loc to set a individual value instead of a whole column. Here is a contained example
import pandas as pd
import numpy as np
X = pd.DataFrame(np.arange(5), columns=['PetID'])
for ind, row in X.iterrows():
petid = row['PetID']
X.loc[ind, 'DescriptionLanguage'] = 'No description for {}'.format(petid)
Upvotes: 2