Counter10000
Counter10000

Reputation: 524

Pandas Data Frame Merge

I am new to Pandas. I am trying to make a data set with ZIP Code, Population in that ZIP Code, and Number of Counties in the ZIP Code.

I get the data from census website: https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt

I am trying with the following code, but it is not working. Could you help me to figure out the correct code? I have a hunch that the error is due to data frame or sorts related to data type. But I can not work out the correct code to make it right. Please let me know your thoughts. Thank you in advance!

import pandas as pd

df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP'])

zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1)

zcta_ct_county = df['ZCTA5'].value_counts()

zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY']

pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']]

Here is my error message:

Traceback (most recent call last):    
File "<stdin>", line 1, in <module>    
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 58, in merge copy=copy, indicator=indicator)   
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 473, in __init__ 'type {0}'.format(type(right)))    
ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>

SOLUTION

import pandas as pd
df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP'])
zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1)
zcta_ct_county = df['ZCTA5'].value_counts().reset_index()
zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY']
pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']]

Upvotes: 1

Views: 452

Answers (1)

jezrael
jezrael

Reputation: 863531

I think you need add reset_index, because output of value_counts is Series and need DataFrame with 2 columns:

zcta_ct_county = df['ZCTA5'].value_counts().reset_index()

Upvotes: 1

Related Questions