Sean
Sean

Reputation: 3385

How to create a new column of zipped list items from separate two columns in DataFrame?

This question is somewhat motivated from a previous question I asked - Pandas groupby make two columns lists separately. This time I want to create a new column where each value is a single list that contains tuples of the zipped values from the other two columns. For example:

# Original DataFrame
      fruit      sport                       weather
0     apple      [baseball, basketball]      [sunny, windy]
1     banana     [swimming, hockey]          [cloudy, windy]
2     orange     [football]                  [sunny]


# Desired DataFrame
      fruit      sport                       weather             pairs
0     apple      [baseball, basketball]      [sunny, windy]      [(baseball, sunny), (basketball, windy)]
1     banana     [swimming, hockey]          [cloudy, windy]     [(swimming, cloudy), (hocky, windy)]
2     orange     [football]                  [sunny]             [(football, sunny)]

I've tried the following code, but it gives me something else:

df['pairs'] = list(zip(df['sport'], df['weather']))

# Output DataFrame
      fruit      sport                       weather             pairs
0     apple      [baseball, basketball]      [sunny, windy]      ([baseball, sunny], [basketball, windy])
1     banana     [swimming, hockey]          [cloudy, windy]     ([swimming, cloudy], [hocky, windy])
2     orange     [football]                  [sunny]             ([football], [sunny])

As you can see, it's "reversed" from what I want to do. What is the appropriate way that I should go about this? Thanks in advance.

Upvotes: 0

Views: 1547

Answers (3)

Dani Mesejo
Dani Mesejo

Reputation: 61910

You could take advantage of the fact that map has an embedded zip, and do:

df['pairs'] = [list(x) for x in map(zip, df['sport'], df['weather'])]
print(df)

Output

    fruit  ...                                     pairs
0   apple  ...  [(baseball, sunny), (basketball, windy)]
1  banana  ...     [(swimming, cloudy), (hockey, windy)]
2  orange  ...                       [(football, sunny)]

[3 rows x 4 columns]

Or you could use itertuples:

df['pairs'] = [list(zip(*x)) for x in df[['sport', 'weather']].itertuples(index=False)]

Upvotes: 1

Erfan
Erfan

Reputation: 42906

Use DataFrame.apply over axis=1 with zip:

df['pairs'] = df.apply(lambda x: list(zip(x['sport'], x['weather'])), axis=1)
    fruit                   sport          weather                                     pairs
0   apple  [baseball, basketball]   [sunny, windy]  [(baseball, sunny), (basketball, windy)]
1  banana      [swimming, hockey]  [cloudy, windy]     [(swimming, cloudy), (hockey, windy)]
2  orange              [football]          [sunny]                       [(football, sunny)]

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

I think you are missing another list(zip()):

df['pairs'] = list(list(zip(a,b)) for a,b in zip(df['sport'], df['weather']))

Output:

    fruit    sport                       weather              pairs
 0  apple    ['baseball', 'basketball']  ['sunny', 'windy']   [('baseball', 'sunny'), ('basketball', 'windy')]
 1  banana   ['swimming', 'hockey']      ['cloudy', 'windy']  [('swimming', 'cloudy'), ('hockey', 'windy')]
 2  orange   ['football']                ['sunny']            [('football', 'sunny')]

Upvotes: 2

Related Questions