Shankar Guru
Shankar Guru

Reputation: 1161

Segregate and create dictionary from pandas dataframe

I have a dataframe as below, I want to create a dictionary as below:

example:

{'52.00,20.00': [1.5,4.1,3.6,3.1], '49.25,20.00': [0.5,2.1,1,2.1]}

i.e. {'value of A, B': List of avg(A)}

I am new to python pandas, can some one please suggest ?

+-----------+-----------+-----------------+----------------+
| A         | B         | avg(A)          | hour           |
+-----------+-----------+-----------------+----------------+
|     52.00 |     20.00 |             1.5 |              1 |
|     52.00 |     20.00 |             4.1 |             14 |
|     52.00 |     20.00 |             3.6 |             15 |
|     52.00 |     20.00 |             3.1 |             16 |
|     49.25 |     20.00 |             0.5 |              0 |
|     49.25 |     20.00 |             2.1 |              1 |
|     49.25 |     20.00 |               1 |             14 |
|     49.25 |     20.00 |             2.1 |             15 |
|     53.75 |     20.00 |            2.66 |             20 |
|     53.75 |     20.00 |            2.66 |             21 |
|     53.75 |     20.00 |            2.66 |             22 |
|     53.75 |     20.00 |            5.07 |             23 |
|     48.00 |     20.00 |            0.97 |              0 |
|     48.00 |     20.00 |            0.97 |              1 |
|     48.00 |     20.00 |            1.57 |             14 |
|     48.00 |     20.00 |            1.57 |             15 |

Upvotes: 1

Views: 61

Answers (1)

cs95
cs95

Reputation: 403110

For two columns, you can manually concatenate (for better performance), and then use as a clause for a groupby + apply operation.

df[['A', 'B']] = df[['A', 'B']].astype(str)
df['avg(A)'].groupby(df['A'] + ',' + df['B']).apply(list).to_dict()

{
    "48.0,20.0": [
        0.97,
        0.97,
        1.57,
        1.57
    ],
    "49.25,20.0": [
        0.5,
        2.1,
        1.0,
        2.1
    ],
    "52.0,20.0": [
        1.5,
        4.1,
        3.6,
        3.1
    ],
    "53.75,20.0": [
        2.66,
        2.66,
        2.66,
        5.07
    ]
}

For multiple columns (more than just A and B), you can use apply with str.join to create a column of comma separated numbers that should generalise to any number of columns.

c = ['A', 'B', ...]
df['avg(A)'].groupby(df[c].astype(str).apply(','.join, 1)).apply(list).to_dict()

Upvotes: 2

Related Questions