RageAgainstheMachine
RageAgainstheMachine

Reputation: 949

How to join Pandas data frames in python

I'm having some issues with joining dataframes in python. The code is supposed to create a dataframe for each US state and put it into a main dataframe.

I think the problem is coming from the column titles. When I print them I get this:

Index([u'0AL', u'0', u'0AZ', u'0', u'0CO', u'0', u'0DE', u'0', u'0GA', u'0',
       u'0IA', u'0', u'0IL', u'0', u'0KS', u'0', u'0LA', u'0', u'0MD', u'0',
       u'0MI', u'0', u'0MO', u'0', u'0MT', u'0', u'0ND', u'0', u'0NH', u'0',
       u'0NM', u'0', u'0NY', u'0', u'0OK', u'0', u'0PA', u'0', u'0SC', u'0',
       u'0TN', u'0', u'0UT', u'0', u'0VT', u'0', u'0WI', u'0', u'0WY', u'0'],
      dtype='object')

When I print the dataframe I get this:

            0AL    0  0AZ  0  0CO    0  0DE    0  0GA    0 ...   0TN    0  \
2017-01-01  6.5  6.4  3.8  5  5.2  2.9  4.5  4.4    5  5.5 ...   2.9  5.4

            0UT    0  0VT    0  0WI    0  0WY    0
2017-01-01  4.8  3.1    4  3.1  5.1  3.9  5.6  4.8

[1 rows x 50 columns]

It seems to be adding "0" everywhere as well as replacing some of the column headers (States). Initially, I was grabbing the States abbreviations from Wikipedia but then thought I would try and input them manually but that didn't help. Thanks in advance.

The purpose of this is to be able to rank the different states based on certain criteria.

Here is the code:

from fredapi import Fred
fred = Fred(api_key="XXXXXXXXXXXXXXXXX") 
import matplotlib.pyplot as plt
from IPython.core.pylabtools import figsize
import datetime
import pandas as pd
import time
import html5lib
import numpy as np

#html= pd.read_html('https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States')
#us_states= html[0][1][2:]

states =['AK','AL','AR','AZ','CA','CO','CT','DE','FL','GA','HI','IA','ID','IL','IN','KS','KY','LA','MA','MD','ME','MI','MN','MO','MS','MT','NC','ND','NE','NH', 'NJ','NM', 'NV','NY','OH','OK','OR','PA','RI','SC', 'SD','TN','TX','UT','VA','VT','WA','WI','WV',"WY"]

start = datetime.datetime(2017,1,1)
end = datetime.datetime(2017,1,1)

figsize(40, 5)

main_df = pd.DataFrame()

for abbv in states:

    df= pd.DataFrame(fred.get_series(series_id=str(abbv)+"UR",observation_start=start, observation_end=end))

    if main_df.empty:
        main_df = df

    else:
        main_df = main_df.join(df, lsuffix=abbv)


print main_df

The first dataframe looks like this:

             0
2017-01-01  6.5

Upvotes: 2

Views: 128

Answers (1)

TLOwater
TLOwater

Reputation: 638

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.join.html

The documentation says that you are using a suffix (i.e. something that comes after a word) after column headers where they're overlapping. This doesn't mean the column header is set to abbv - instead you get this behaviour where you get 0NY for example.

If you want to set the column headers to your states set it with df.rename as so:

df = df.rename(columns={'0': abbv})

Then do the join.

EDIT: Alternatively set the header to save a rename as so:

df= pd.DataFrame(fred.get_series(series_id=str(abbv)+"UR",observation_start=start, observation_end=end), columns=[abbv])

Upvotes: 2

Related Questions