Reputation: 962
I've a dataframe like -
Challenge Points
challenge1 {'k01-001': 0.5, 'k03-015':0.3, 'k01-005': 0.2}
challenge2 {'k02-001': 0.5, 'k06-003':0.4, 'k04-001': 0.1}
challenge3 {'k04-001': 0.1, 'k06-003':0.9}
challenge4 {'k01-005': 0.2, 'k01-001':0.4, 'k03-002': 0.2, 'k01-007': 0.2}
challenge5 {'k06-003': 0.6, 'k04-001':0.4}
From here I want to make a dictionary where the keys should be the tuples of two points that have been evaluated together for a challenge (eg. ('k01-001', 'k01-005')
) and the value should be how many challenges they have been evaluated together in. So, something like -
{('k01-001', 'k01-005'): 2, ('k01-001', 'k03-015'): 1, ('k01-005', 'k03-015'): 1, ('k04-001', 'k06-003'): 3, ... }
I've so far managed to read individual dictionaries in the Points
column using this code -
for index, row in df.iterrows():
dict_temp = json.loads(row['Points'].replace("'", '"'))
for key, value in dict_temp.items():
# SOME CODE HERE
but, I'm not sure how to proceed from here.
Upvotes: 0
Views: 274
Reputation: 30920
I would use map
and reduce
with defaultdict to count:
from collections import defaultdict
from functools import reduce
from itertools import combinations
combs = reduce(lambda x, y: x + y,
map(lambda x: tuple(map(sorted, combinations(list(x), 2))) ,
df['Points']))
d = defaultdict(int)
for comb in combs:
d[tuple(comb)] += 1
d = dict(d)
print(d)
{('k01-001', 'k03-015'): 1, ('k01-001', 'k01-005'): 2, ('k01-005', 'k03-015'): 1,
('k02-001', 'k06-003'): 1, ('k02-001', 'k04-001'): 1, ('k04-001', 'k06-003'): 3,
('k01-005', 'k03-002'): 1, ('k01-005', 'k01-007'): 1, ('k01-001', 'k03-002'): 1,
('k01-001', 'k01-007'): 1,('k01-007', 'k03-002'): 1}
Time comparison:
%%timeit
combs = reduce(lambda x,y: x + y,
map(lambda x: tuple(map(sorted, combinations(list(x), 2))) ,
df['Points']))
d = defaultdict(int)
for comb in combs:
d[tuple(comb)]+=1
d = dict(d)
26.2 µs ± 439 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
s=(df.Points.apply(lambda x: tuple(itertools.combinations(x.keys(), 2))).explode()
.apply(lambda x : tuple(sorted(x))).value_counts()).to_dict()
1.69 ms ± 62.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 1
Reputation: 323246
IIUC we need itertools
to get the combination
then we do explode
, and sorted
the value within the tuple
and value_counts
import itertools
s=df.Points.apply(lambda x: tuple(itertools.combinations(x.keys(), 2))).explode().apply(lambda x : tuple(sorted(x))).value_counts()
Out[543]:
(k04-001, k06-003) 3
(k01-001, k01-005) 2
(k02-001, k04-001) 1
(k01-005, k03-002) 1
(k01-005, k03-015) 1
(k01-001, k03-002) 1
(k01-001, k03-015) 1
(k01-001, k01-007) 1
(k01-005, k01-007) 1
(k01-007, k03-002) 1
(k02-001, k06-003) 1
Name: Points, dtype: int64
If you need dict
s.to_dict()
Out[546]:
{('k04-001', 'k06-003'): 3,
('k01-001', 'k01-005'): 2,
('k02-001', 'k04-001'): 1,
('k01-005', 'k03-002'): 1,
('k01-005', 'k03-015'): 1,
('k01-001', 'k03-002'): 1,
('k01-001', 'k03-015'): 1,
('k01-001', 'k01-007'): 1,
('k01-005', 'k01-007'): 1,
('k01-007', 'k03-002'): 1,
('k02-001', 'k06-003'): 1}
Upvotes: 1