Reputation: 18227
I'm trying to clean up some dirty data. Each row is supposed to have a PhraseID, but for weird reasons, after massaging and merging several dataframes, I ended up with a column name of '2'.
But I'm having trouble getting the value of that field. If I try row['2'] I get a Python KeyError. Are all rows not guaranteed to have the same columns? So now I check to see if the key '2' is in the df.columns, and seems like it is not.
In the code below the altPhraseID gets set to "column 2 not found". I know the problem data is about index number 561 and 562, so I've added extra debug there.
for index, row in df_master_combined.iterrows():
phraseID = row['PhraseId']
phrase = row['PhraseMaster']
filename = row['Filename']
#altPhraseID = row['PhraseId2']
if '2' in df_master_combined.columns:
altPhraseID = row['2']
else:
altPhraseID = "column 2 not found"
if 560 < index < 563:
print("index=" + str(index) +
" phraseID=" + str(phraseID) +
" phrase=" + str(phrase) +
" filename=" + str(filename) +
" altPhraseID=", str(altPhraseID))
subcounter=0
for col in df_master_combined.columns:
subcounter += 1
print(" Test:" + str(subcounter) +
" fieldname=" + str(col) +
" value=" + str(row[col]))
However, the second loop prints the output below, which shows I can access the value 562.0 in this manner. I don't understand why I can't access it just using row['2'].
Test:1 fieldname=Language value=0.0
Test:2 fieldname=Phrase value=0.0
Test:3 fieldname=PhraseId value=0
Test:4 fieldname=PhraseMaster value=$ Value - Lot
Test:5 fieldname=2 value=562.0
Test:6 fieldname=Filename value=phrase010000.xlsx
Test:7 fieldname=LanguageMaster value=EN
I also tried renaming it with both of the following:
df_master_combined.rename(columns={4: 'PhraseId2'}, inplace=True)
df_master_combined.rename(columns={'2': 'PhraseId2'}, inplace=True)
But before/after the rename, the data looks the same:
Data type of each column of Dataframe :df_master_combined
Language float64
Phrase float64
PhraseId object
PhraseMaster object
2 float64
Filename object
LanguageMaster object
dtype: object
Upvotes: 0
Views: 436
Reputation: 861
The column name can be an int.
you could rename with:
df_master_combined.rename(columns={2: 'PhraseId2'}, inplace=True)
Upvotes: 1