Reputation: 1114
I have a dataset with sequence tuples and targets like the following:
input_0 input_1 input_2 output
0 0 1.0 2.0 4.0
1 1 2.0 4.0 2.0
2 2 4.0 2.0 4.0
3 4 2.0 4.0 7.0
4 2 4.0 7.0 8.0
I have trained algorithms using the output as a target value.
What I want though, is to get the two most possible variables that can occur by one tupple.
For example if i have two tuples for training: a,b,c,d
and a,b,c,e
I want to get d
and e
as a result with the respective percentage.
Is there something like that possible?
Upvotes: 1
Views: 208
Reputation: 76376
From your comments, this seems to be a pandas.DataFrame. Say you start with
from collections import Counter
df = pd.DataFrame({
'input_0': [1, 1, 2, 4, 2],
'input_1': [1, 1, 2, 4, 4],
'input_2': [2, 2, 2, 4, 7],
'output': [4, 3, 4, 7, 8]})
>>> df
input_0 input_1 input_2 output
0 1 1 2 4
1 1 1 2 3
2 2 2 2 4
3 4 4 4 7
4 2 4 7 8
Then the following will show the two most common elements per each input tuple, as well as their counts:
>>> df.output.groupby([df.input_0, df.input_1, df.input_2]).apply(lambda s: Counter(s).most_common(2)).reset_index()
input_0 input_1 input_2 output
0 1 1 2 [(3, 1), (4, 1)]
1 2 2 2 [(4, 1)]
2 2 4 7 [(8, 1)]
3 4 4 4 [(7, 1)]
Upvotes: 1