Chris
Chris

Reputation: 515

Cartesian product of a DataFrame and list

I have a list of items. I also have a dataframe. If the list has 3 items and the dataframe has 4 rows, I want to iterate and add each item and then copy the row and add the next item, etc. So the end result is a dataframe that went from 4 rows to 12 rows (4 rows times 3 items in a list). I tried converting df to list and then iterating via append and extend but it wasn't what I wanted, it just kept appending values to the list rather than copying a new list and only appending the current iterative value.

  group     start       stop
0   abc  1/1/2016   8/1/2016
1   xyz  5/1/2016  12/1/2016
2   jkl  3/7/2017  1/31/2018

b = ['a','b','c','d']

The expected result is a dataframe like this:

group   start   stop    new col
abc 1/1/2016    8/1/2016    a
abc 1/1/2016    8/1/2016    b
abc 1/1/2016    8/1/2016    c
abc 1/1/2016    8/1/2016    d
xyz 5/1/2016    12/1/2016   a
xyz 5/1/2016    12/1/2016   b
xyz 5/1/2016    12/1/2016   c
xyz 5/1/2016    12/1/2016   d
jkl 3/7/2017    1/31/2018   a
jkl 3/7/2017    1/31/2018   b
jkl 3/7/2017    1/31/2018   c
jkl 3/7/2017    1/31/2018   d

Upvotes: 0

Views: 560

Answers (2)

cs95
cs95

Reputation: 403238

You can do this efficiently using np.tile:

groups = ['a','b','c','d']  

arr = np.column_stack([
    df.values.repeat(len(groups), axis=0), 
    np.tile(groups, len(df))
]) 
pd.DataFrame(arr, columns=[*df, 'new_col'])

   group     start       stop new_col
0    abc  1/1/2016   8/1/2016       a
1    abc  1/1/2016   8/1/2016       b
2    abc  1/1/2016   8/1/2016       c
3    abc  1/1/2016   8/1/2016       d
4    xyz  5/1/2016  12/1/2016       a
5    xyz  5/1/2016  12/1/2016       b
6    xyz  5/1/2016  12/1/2016       c
7    xyz  5/1/2016  12/1/2016       d
8    jkl  3/7/2017  1/31/2018       a
9    jkl  3/7/2017  1/31/2018       b
10   jkl  3/7/2017  1/31/2018       c
11   jkl  3/7/2017  1/31/2018       d

Upvotes: 1

BENY
BENY

Reputation: 323396

Check with Performant cartesian product (CROSS JOIN) with pandas

newdf=df.assign(key=1).merge(pd.DataFrame({'key':[1]*len(b),'v':b})).drop('key',1)

Upvotes: 3

Related Questions