Pivot multiple columns from row to column

Question

I have a PySpark dataframe which looks like this:

| id   | name   | policy     | payment_name | count |
|------|--------|------------|--------------|-------|
| 2    | two    | 0          | Hybrid       | 58    |
| 2    | two    | 1          | Hybrid       | 2     |
| 5    | five   | 1          | Excl         | 13    |
| 5    | five   | 0          | Excl         | 70    |
| 5    | five   | 0          | Agen         | 811   |
| 5    | five   | 1          | Agen         | 279   |
| 5    | five   | 1          | Hybrid       | 600   |
| 5    | five   | 0          | Hybrid       | 2819  |

I would like to make the combination of policy and payment_name become a column with the respective count (reducing down to one row per id).

Output would look like this:

| id | name | no_policy_hybrid | no_policy_excl | no_policy_agen | policy_hybrid | policy_excl | policy_agen |
|----|------|------------------|----------------|----------------|---------------|-------------|-------------|
| 2  | two  | 58               | 0              | 0              | 2             | 0           | 0           |
| 5  | five | 2819             | 70             | 811            | 600           | 13          | 279         |

In cases where there is no combination we can default it to 0 i.e. id 2 has no combination including payment_name Excl so it is set 0 on the example output.

Pivot multiple columns from row to column

Answers (1)

Related Questions