Reputation: 2588
I have this DataFrame
car color years max_years
0 audi black 1 7
1 audi blue 2 7
2 audi purple 4 7
3 audi black 6 7
4 bmw blue 1 5
5 bmw green 2 5
6 bmw grey 5 5
7 bmw blue 20 5
8 fiat green 1 4
9 fiat green 3 4
10 fiat green 4 4
11 fiat green 10 4
If a color entry is 1 year I want to count how many more times that color appears for that car brand group up to the max years for that group.
I would like to run the isin
color condition for each car brand group, I think my problem is that the color list is not grouby('car')
and therefore the evaluation is for all cars
The result should be:
0 audi 2
1 bmw 1
2 fiat 3
Any help would be appreciated
import pandas as pd
car = ['audi', 'audi', 'audi', 'audi', 'bmw', 'bmw', 'bmw', 'bmw', 'fiat', 'fiat', 'fiat', 'fiat']
color = ['black', 'blue', 'purple', 'black', 'blue', 'green', 'grey', 'blue', 'green', 'green', 'green', 'green']
years = [1, 2, 4, 6, 1, 2, 5, 20, 1, 3, 4, 10, ]
max_years = [7, 7, 7, 7, 5, 5, 5, 5, 4, 4, 4, 4]
data = {'car': car, 'color': color, 'years': years, 'max_years': max_years}
df = pd.DataFrame(data=data)
colors = df.loc[df.years == 1]['color'].values
colour_cars = df[(df.years <= df.max_years) & df['color'].isin(colors)].groupby(['car']).size().reset_index(name='colour_cars')
print(colour_cars)
Upvotes: 2
Views: 145
Reputation: 862681
Idea is use Series.map
by Series
created with filtered DataFrame with years == 1
and compare by column color
:
colors = df.loc[df.years == 1].set_index('car')['color']
df1 = (df[(df.years <= df.max_years) & df['car'].map(colors).eq(df['color'])]
.groupby('car')
.size()
.reset_index(name='colour_cars'))
print(df1)
car colour_cars
0 audi 2
1 bmw 1
2 fiat 3
Or you can use mask
converted to integers by Series.view
, then is necessary count True
s values by sum
and pass Series
df['car']
to groupby
:
colors = df.loc[df.years == 1].set_index('car')['color']
df1 = (((df.years <= df.max_years) & df['car'].map(colors).eq(df['color']))
.view('i1')
.groupby(df['car'])
.sum()
.reset_index(name='colour_cars'))
print(df1)
car colour_cars
0 audi 2
1 bmw 1
2 fiat 3
Different idea is test first color per group by GroupBy.transform
with first
(solution working if always first year per group is 1
):
df2 = (df[(df.years <= df.max_years)]
.groupby('car')['color']
.transform('first').eq(df['color'])
.view('i1')
.groupby(df['car'])
.sum()
.reset_index(name='colour_cars'))
print(df2)
car colour_cars
0 audi 2
1 bmw 1
2 fiat 3
Upvotes: 4