Reputation: 185
I am trying to filter values from a column in a pandas Dataframe but I seem to be receiving booleans instead of actual values. I am trying to filter our data by month and year. In the code following below you will see I am only filtering out by year but I have tried month and year multiple times in different ways:
In [1]: import requests
In [2]: import pandas as pd # pandas
In [3]: import datetime as dt # module for manipulating dates and times
In [4]: url = "http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv"
In [5]: source = requests.get(url).text
In [6]: from io import StringIO, BytesIO
In [7]: s = StringIO(source)
In [8]: election_data = pd.DataFrame.from_csv(s, index_col=None).convert_objects(convert_dates="coerce", convert_numeric=True)
In [9]: election_data.head(n=3)
Out[9]:
Pollster Start Date End Date Entry Date/Time (ET) \
0 Politico/GWU/Battleground 2012-11-04 2012-11-05 2012-11-06 08:40:26
1 YouGov/Economist 2012-11-03 2012-11-05 2012-11-26 15:31:23
2 Gravis Marketing 2012-11-03 2012-11-05 2012-11-06 09:22:02
Number of Observations Population Mode Obama Romney \
0 1000.0 Likely Voters Live Phone 47.0 47.0
1 740.0 Likely Voters Internet 49.0 47.0
2 872.0 Likely Voters Automated Phone 48.0 48.0
Undecided Other Pollster URL \
0 6.0 NaN http://elections.huffingtonpost.com/pollster/p...
1 3.0 NaN http://elections.huffingtonpost.com/pollster/p...
2 4.0 NaN http://elections.huffingtonpost.com/pollster/p...
Source URL Partisan Affiliation \
0 http://www.politico.com/news/stories/1112/8338... Nonpartisan None
1 http://cdn.yougov.com/cumulus_uploads/document... Nonpartisan None
2 http://www.gravispolls.com/2012/11/gravis-mark... Nonpartisan None
Question Text Question Iteration
0 NaN 1
1 NaN 1
2 NaN 1
In [10]: start_date = pd.Series(election_data["Start Date"])
...: start_date.head(n=3)
...:
Out[10]:
0 2012-11-04
1 2012-11-03
2 2012-11-03
Name: Start Date, dtype: datetime64[ns]
In [11]: filtered = start_date.map(lambda x: x.year == 2012)
In [12]: filtered
Out[12]:
0 True
1 True
2 True
...
587 False
588 False
589 False
Name: Start Date, dtype: bool
Upvotes: 1
Views: 362
Reputation: 862661
I think you need read_csv
with url
address first and then boolean indexing
with mask created by year
and month
:
election_data = pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv', parse_dates=[1,2,3])
print (election_data.head(3))
Pollster Start Date End Date Entry Date/Time (ET) \
0 Politico/GWU/Battleground 2012-11-04 2012-11-05 2012-11-06 08:40:26
1 YouGov/Economist 2012-11-03 2012-11-05 2012-11-26 15:31:23
2 Gravis Marketing 2012-11-03 2012-11-05 2012-11-06 09:22:02
Number of Observations Population Mode Obama Romney \
0 1000.0 Likely Voters Live Phone 47.0 47.0
1 740.0 Likely Voters Internet 49.0 47.0
2 872.0 Likely Voters Automated Phone 48.0 48.0
Undecided Other Pollster URL \
0 6.0 NaN http://elections.huffingtonpost.com/pollster/p...
1 3.0 NaN http://elections.huffingtonpost.com/pollster/p...
2 4.0 NaN http://elections.huffingtonpost.com/pollster/p...
Source URL Partisan Affiliation \
0 http://www.politico.com/news/stories/1112/8338... Nonpartisan None
1 http://cdn.yougov.com/cumulus_uploads/document... Nonpartisan None
2 http://www.gravispolls.com/2012/11/gravis-mark... Nonpartisan None
Question Text Question Iteration
0 NaN 1
1 NaN 1
2 NaN 1
print (election_data.dtypes)
Pollster object
Start Date datetime64[ns]
End Date datetime64[ns]
Entry Date/Time (ET) datetime64[ns]
Number of Observations float64
Population object
Mode object
Obama float64
Romney float64
Undecided float64
Other float64
Pollster URL object
Source URL object
Partisan object
Affiliation object
Question Text float64
Question Iteration int64
dtype: object
election_data[election_data["Start Date"].dt.year == 2012]
election_data[(election_data["Start Date"].dt.year == 2012) & (election_data["Start Date"].dt.month== 10)]
Upvotes: 3
Reputation: 294258
you can use pandas date filtering if you make Start Date
the index
get all of 2012
election_data.set_index('Start Date')['2012']
get all of Jan, 2012
election_data.set_index('Start Date')['2012-01']
get all between Jan 1, 2012
and Jan 13, 2012
election_data.set_index('Start Date')['2012-01-01':'2012-01-13]
Upvotes: 3