aldsmith
aldsmith

Reputation: 53

How can I create a pandas data frame from each unique combination of multiple lists?

I'm trying to create a pandas data frame based on every unique combination of four lists of different lengths. I'm a relative beginner.

I constructed a nested list of combinations like so:

combinations = [
    [
        [
            [
                [w,x,y,z]for w in sexes
            ]
            for x in ages
        ]
        for y in destination_codes
    ] 
    for z in origin_codes
]

Where each of these is a simple list. This works fine, but I don't know how to get this into a four column frame with one row for each unique combination, like this:

https://i.sstatic.net/q2tEl.jpg

I tried this:

total = pd.DataFrame(columns=['origin', 'destination', 'age', 'sex'])
    for first in combinations:
        for second in first:
            for third in second:
                for fourth in third:
                    summary_table = pd.DataFrame({'Origin': [first], 'Destination': [second], 'Age': [third], 'Sex:' [fourth])
                    total.append(summary_table)

Which doesn't work at all.

Any pointers would be very helpful - I'm not sure if this is a simple error or whether I'm approaching the whole problem in the wrong way. Any thoughts?

Upvotes: 1

Views: 567

Answers (3)

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

Try this one:

import pandas as pd
import numpy as np

sexes=["m", "f"]
ages=["young", "middle", "old"]
destination_codes=["123", "039", "0230", "0249"]
origin_codes=["304", "0430", "034i39", "430", "0349"]
combined_ = np.array([[a,b,c,d] for a in sexes for b in ages for c in destination_codes for d in origin_codes])

df=pd.DataFrame(data={"sexes": combined_[:,0], "ages": combined_[:,1], "destination": combined_[:,2], "origin": combined_[:,3]})

Upvotes: 0

hunzter
hunzter

Reputation: 598

Is this correct of what you want?

combinations = [
    [w,x,y,z]
    for w in sexes
    for x in ages
    for y in destination_codes
    for z in origin_codes
]
total_df = pd.DataFrame(combinations, columns=['sex', 'age', 'origin', 'destination'])

But using a list comprehension here can be quite inefficient. There is a better way to do this using itertools.product

from itertools import product
combinations = list(product(ages, ages, origin_codes, destination_codes))

Upvotes: 1

Valdi_Bo
Valdi_Bo

Reputation: 30981

Use itertools.product. It returns the Cartesian product of sequences given as parameters.

Upvotes: 0

Related Questions