Reputation: 693
After one-hot encoding 2 different features, then joining the resulting one-hot encoded columns with the original dataframe in Pandas, I have a 3 dataframes.
The first is OneHotZips (which contains my one-hot encoded feature #1). The second is OneHotYearBuilt (same thing, i.e. my feature #2 as one hot encoded columns in a dataframe). Last, I have subset, which is the previous two joined with the original dataframe. More concretely, subset.keys() is:
Index(['lat_z', 'lon_z', 'price_z', 'lot_z', 'LotSizeSquareFeet',
'TotalBedrooms', 'NormalizedBathCount', 'PropertyAddressLatitude',
'PropertyAddressLongitude', 'MonthsToSale',
...
'year_built_2008.0', 'year_built_2009.0', 'year_built_2010.0',
'year_built_2011.0', 'year_built_2012.0', 'year_built_2013.0',
'year_built_2014.0', 'year_built_2015.0', 'year_built_2016.0',
'year_built_2017.0'],
dtype='object', length=477)
I would like to use only some of these columns in a new dataframe, called downsampled_z.
I have been able to get a string of strings? with
'"' + '", "'.join(list(OneHotZips.columns.values)) + '"'
It looks like:
'"year_built_1882.0", "year_built_1900.0", ... "year_built_2017.0"'
Which seems to be the way I want it, but the following doesn't work:
downsampled_z = subset[["lat_z", "lon_z", "price_z", "lot_z", "TotalBedrooms", "NormalizedBathCount", "built_prct",
'"' + '", "'.join(list(OneHotZips.columns.values)) + '"',
'"' + '", "'.join(list(OneHotYearBuilt.columns.values)) + '"']]
This results in a keyerror of
'[\'"year_built_1882.0", "year_built_1900.0", ... "year_built_2017.0"\'] not in index
Other approaches I have taken such as
[str(x) for x in list(OneHotZips.columns.values)]
result in
ValueError: setting an array element with a sequence
Upvotes: 0
Views: 55
Reputation: 8683
That is because you are really creating one long string, which is not your column name. You can just use:
downsampled_cols = ["lat_z", "lon_z", "price_z", "lot_z", "TotalBedrooms", "NormalizedBathCount", "built_prct"] +\
list(OneHotZips.columns.values) +\
list(OneHotYearBuilt.columns.values) +\
...
And then,
downsampled_z = subset[downsampled_cols]
If you join a list of strings, you end up with a single string.
I think your confusion is basically due to something else. In Python, the print
statement does not show the type of the variable. That is, if you print('abc')
you will get abc
(without quotes), and print(123)
will give you 123
(also without quotes). If you use repr
instead, you would see the quotes, or lack thereof. But it can get confusing that way. What I mean to say is Duck Typing. Don't worry about explicitly seeing quotes in your output. If there is a letter in your variable value, then the variable type is string
. You can, of course, store numbers as strings, which is where there could be some confusion.
Upvotes: 1