Reputation: 8025
Given a DF:
pd.DataFrame({"A":[1,2,3],
"B": [{"Mon":"Closed", "Tue":"Open", "Wed":"Closed"},
{"Mon":"Open", "Tue":"Open", "Wed":"Closed"},
{"Mon":"Open", "Tue":"Open", "Wed":"Open"}]
})
How do i get a count of number of times "Closed" appears in the dict?
A B count
1 {..} 2
2 {..} 1
3 {..} 0
I really don't know how to start on this to try
Upvotes: 5
Views: 80
Reputation: 59549
You can use a Counter in a simple list comprehension.
from collections import Counter
df['count'] = [Counter(x.values())['Closed'] for x in df.B]
# A B Count
#0 1 {'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} 2
#1 2 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Closed'} 1
#2 3 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Open'} 0
Upvotes: 0
Reputation: 195438
Straightforward .apply()
solution:
df['Count'] = df.B.apply(lambda x: sum('Closed' in v for v in x.values()))
print(df)
Prints:
A B Count
0 1 {'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} 2
1 2 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Closed'} 1
2 3 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Open'} 0
Benchmark:
import perfplot
import pandas as pd
def f1(df):
df['Count'] = df.B.apply(lambda x: sum('Closed' in v for v in x.values()))
return df
def f2(df):
df['count'] = df.B.astype(str).str.count('Closed')
return df
# Commented out because of timed-out:
# def f3(df):
# df['count'] = df.B.apply(pd.Series).eq('Closed').sum(1)
# return df
def f4(df):
df['count'] = pd.DataFrame(df['B'].tolist()).stack().eq("Closed").sum(level=0)
return df
def setup(n):
A = [*range(n)]
B = [{'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} for _ in range(n)]
df = pd.DataFrame({'A': A,
'B': B})
return df
perfplot.show(
setup=setup,
kernels=[f1, f2, f4],
labels=['apply(sum)', 'str.count()', 'stack.eq()'],
n_range=[10**i for i in range(1, 7)],
xlabel='N (* len(df))',
equality_check=None,
logx=True,
logy=True)
Result:
So it seems that straightforward apply()
with sum()
is fastest.
Upvotes: 0
Reputation: 306
Use an auxiliary function:
def aux_func(x):
week_days = x.keys()
count=0
for day in week_days:
if x[day]=='Closed':
count+=1
return count
counts = [aux_func(c) for c in df.loc[:,'B'] ]
df['counts'] = counts
Upvotes: 0
Reputation: 78700
Please don't put dictionaries into dataframe columns. You are losing all the speed of vectorized operations and make values hard to access.
Clean your df
:
>>> df = pd.concat([df['A'], df['B'].apply(pd.Series)], axis=1)
>>> df
A Mon Tue Wed
0 1 Closed Open Closed
1 2 Open Open Closed
2 3 Open Open Open
Now counting 'Closed'
is easy.
>>> df['count'] = df.eq('Closed').sum(1)
>>> df
A Mon Tue Wed count
0 1 Closed Open Closed 2
1 2 Open Open Closed 1
2 3 Open Open Open 0
Upvotes: 0
Reputation: 323276
I will do
df.B.astype(str).str.count('Closed')
Out[30]:
0 2
1 1
2 0
Name: B, dtype: int64
Or
df['Cnt']=pd.DataFrame(df.B.tolist()).eq('Closed').sum(1).values
Out[35]:
0 2
1 1
2 0
dtype: int64
Upvotes: 2
Reputation: 150745
You can do an apply
:
df['count'] = df.B.apply(pd.Series).eq('Closed').sum(1)
Output:
A B count
0 1 {'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} 2
1 2 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Closed'} 1
2 3 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Open'} 0
Upvotes: 4
Reputation: 75080
You can try converting the series of dictionary into a dataframe and then stack
, then take sum of Closed
values on level=0 to get Count per row:
df['Count_closed'] = pd.DataFrame(df['B'].tolist()).stack().eq("Closed").sum(level=0)
A B Count_closed
0 1 {'Mon': 'Closed', 'Tue': 'Open', 'Wed': 'Closed'} 2.0
1 2 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Closed'} 1.0
2 3 {'Mon': 'Open', 'Tue': 'Open', 'Wed': 'Open'} 0.0
Upvotes: 3