kiyis_stats
kiyis_stats

Reputation: 35

How to select rows based on being a subset of values in a list

Data

I have the following data:

data = {'state': ['Alabama', 'Alabama', 'Alabama', 'Alabama', 'Alabama', 'Wisconsin', 'Wisconsin'],
        'year': [1989, 1989, 1989, 1989, 1990, 2016, 1970],
        'quarter': [1, 2, 3, 4, 1, 4, 4],
        'v': [3.984353, 4.427839, 4.173073, 3.485882, 3.865541, 0.168776, 0.168776]}
df = pd.DataFrame(data)

       state  year  quarter         v
0    Alabama  1989        1  3.984353
1    Alabama  1989        2  4.427839
2    Alabama  1989        3  4.173073
3    Alabama  1989        4  3.485882
4    Alabama  1990        1  3.865541
5  Wisconsin  2016        4  0.168776
6  Wisconsin  1970        4  0.168776

The data includes observed values for states and dates back to 1970. For some states it starts later than 1970.

Goal

I want to keep the states for which I can observe the data for both 1970 and 2016.

Code

The code below includes but doesn't subset the data at all:

df.loc[(df['year'] >= 1970) & (df['year'] <= 2016)]

How can I do that in python?

Upvotes: 0

Views: 60

Answers (1)

Corralien
Corralien

Reputation: 120479

If I follow the explanation of @TimRoberts, I think your are looking for:

issubset = lambda x: set([1970, 2016]).issubset(x)
out = df[df.groupby('state')['year'].transform(issubset)]

Upvotes: 2

Related Questions