Forklift17
Forklift17

Reputation: 2477

Pandas Series from two-columned DataFrame produces a Series of NaN's

state_codes = pd.read_csv('name-abbr.csv', header=None)
state_codes.columns = ['State', 'Code']
codes = state_codes['Code']
states = pd.Series(state_codes['State'], index=state_codes['Code'])

name-abbr.csv is a two-columned CSV file of US state names in the first column and postal codes in the second: "Alabama" and "AL" in the first row, "Alaska" and "AK" in the second, and so forth.

The above code correctly sets the index, but the Series is all NaN. If I don't set the index, the state names correctly show. But I want both.

I also tried this line:

states = pd.Series(state_codes.iloc[:,0], index=state_codes.iloc[:,1])

Same result. How do I get this to work?

Upvotes: 1

Views: 49

Answers (1)

jezrael
jezrael

Reputation: 862511

Here is reason called alignment, it means pandas try match index of state_codes['State'].index with new index of state_codes['Code'] and because different get missing values in output, for prevent it is necessary convert Series to numpy array:

states = pd.Series(state_codes['State'].to_numpy(), index=state_codes['Code'])

Or you can use DataFrame.set_index:

states = state_codes.set_index('Code')['State']

Upvotes: 1

Related Questions