NealWalters
NealWalters

Reputation: 18227

Pandas Dataframe - trouble accessing a column with a numeric name

I'm trying to clean up some dirty data. Each row is supposed to have a PhraseID, but for weird reasons, after massaging and merging several dataframes, I ended up with a column name of '2'.

But I'm having trouble getting the value of that field. If I try row['2'] I get a Python KeyError. Are all rows not guaranteed to have the same columns? So now I check to see if the key '2' is in the df.columns, and seems like it is not.

In the code below the altPhraseID gets set to "column 2 not found". I know the problem data is about index number 561 and 562, so I've added extra debug there.

for index, row in df_master_combined.iterrows():
    phraseID = row['PhraseId']
    phrase = row['PhraseMaster']
    filename = row['Filename']
    #altPhraseID = row['PhraseId2']
    if '2' in df_master_combined.columns:
        altPhraseID = row['2']
    else:
        altPhraseID = "column 2 not found"

    if 560 < index < 563:
        print("index=" + str(index) +
              " phraseID=" + str(phraseID) +
              " phrase=" + str(phrase) +
              " filename=" + str(filename) +
              " altPhraseID=", str(altPhraseID))
        subcounter=0
        for col in df_master_combined.columns:
            subcounter += 1
            print(" Test:" + str(subcounter) +
                  " fieldname=" + str(col) +
                  " value=" + str(row[col]))

However, the second loop prints the output below, which shows I can access the value 562.0 in this manner. I don't understand why I can't access it just using row['2'].

 Test:1 fieldname=Language value=0.0
 Test:2 fieldname=Phrase value=0.0
 Test:3 fieldname=PhraseId value=0
 Test:4 fieldname=PhraseMaster value=$ Value - Lot
 Test:5 fieldname=2 value=562.0
 Test:6 fieldname=Filename value=phrase010000.xlsx
 Test:7 fieldname=LanguageMaster value=EN

I also tried renaming it with both of the following:

df_master_combined.rename(columns={4: 'PhraseId2'}, inplace=True)
df_master_combined.rename(columns={'2': 'PhraseId2'}, inplace=True)

But before/after the rename, the data looks the same:

Data type of each column of Dataframe :df_master_combined

Language          float64
Phrase            float64
PhraseId           object
PhraseMaster       object
2                 float64
Filename           object
LanguageMaster     object
dtype: object

Upvotes: 0

Views: 436

Answers (1)

KJDII
KJDII

Reputation: 861

The column name can be an int.

you could rename with:

df_master_combined.rename(columns={2: 'PhraseId2'}, inplace=True)

Upvotes: 1

Related Questions