Reputation: 35
I have the following data:
data = {'state': ['Alabama', 'Alabama', 'Alabama', 'Alabama', 'Alabama', 'Wisconsin', 'Wisconsin'],
'year': [1989, 1989, 1989, 1989, 1990, 2016, 1970],
'quarter': [1, 2, 3, 4, 1, 4, 4],
'v': [3.984353, 4.427839, 4.173073, 3.485882, 3.865541, 0.168776, 0.168776]}
df = pd.DataFrame(data)
state year quarter v
0 Alabama 1989 1 3.984353
1 Alabama 1989 2 4.427839
2 Alabama 1989 3 4.173073
3 Alabama 1989 4 3.485882
4 Alabama 1990 1 3.865541
5 Wisconsin 2016 4 0.168776
6 Wisconsin 1970 4 0.168776
The data includes observed values for states and dates back to 1970. For some states it starts later than 1970.
I want to keep the states for which I can observe the data for both 1970 and 2016.
The code below includes but doesn't subset the data at all:
df.loc[(df['year'] >= 1970) & (df['year'] <= 2016)]
How can I do that in python?
Upvotes: 0
Views: 60
Reputation: 120479
If I follow the explanation of @TimRoberts, I think your are looking for:
issubset = lambda x: set([1970, 2016]).issubset(x)
out = df[df.groupby('state')['year'].transform(issubset)]
Upvotes: 2