Zhang18
Zhang18

Reputation: 4930

Convert pandas dataframe elements to tuple

I have a dataframe:

>>> df = pd.DataFrame(np.random.random((3,3)))
>>> df
          0         1         2
0  0.732993  0.611314  0.485260
1  0.935140  0.153149  0.065653
2  0.392037  0.797568  0.662104

What is the easiest way for me convert each entry to a 2-tuple, with first element from the current dataframe, and 2nd element from the last columns ('2')?

i.e. I want the final results to be:

                      0                    1                      2
0  (0.732993, 0.485260)  (0.611314, 0.485260)  (0.485260, 0.485260)
1  (0.935140, 0.065653)  (0.153149, 0.065653)  (0.065653, 0.065653)
2  (0.392037, 0.662104)  (0.797568, 0.662104)  (0.662104, 0.662104)

Upvotes: 2

Views: 3041

Answers (2)

cs95
cs95

Reputation: 402363

As of pd version 0.20, you can use df.transform:

In [111]: df
Out[111]: 
   0  1  2
0  1  3  4
1  2  4  5
2  3  5  6

In [112]: df.transform(lambda x: list(zip(x, df[2])))
Out[112]: 
        0       1       2
0  (1, 4)  (3, 4)  (4, 4)
1  (2, 5)  (4, 5)  (5, 5)
2  (3, 6)  (5, 6)  (6, 6)

Or, another solution using df.apply:

In [113]: df.apply(lambda x: list(zip(x, df[2])))
Out[113]: 
        0       1       2
0  (1, 4)  (3, 4)  (4, 4)
1  (2, 5)  (4, 5)  (5, 5)
2  (3, 6)  (5, 6)  (6, 6) 

You can also use dict comprehension:

In [126]: pd.DataFrame({i : df[[i, 2]].apply(tuple, axis=1) for i in df.columns})
Out[126]: 
        0       1       2
0  (1, 4)  (3, 4)  (4, 4)
1  (2, 5)  (4, 5)  (5, 5)
2  (3, 6)  (5, 6)  (6, 6)

Upvotes: 3

Rakesh Adhikesavan
Rakesh Adhikesavan

Reputation: 12826

I agree with Corley's comment that you are better off leaving the data in the current format, and changing your algorithm to process data explicitly from the second column.

However, to answer your question, you can define a function that does what's desired and call it using apply.

I don't like this answer, it is ugly and "apply" is syntatic sugar for a "For Loop", you are definitely better off not using this:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((3,3)))


df
          0         1         2
0  0.847380  0.897275  0.462872
1  0.161202  0.852504  0.951304
2  0.093574  0.503927  0.986476


def make_tuple(row):
    n= len(row)
    row = [(x,row[n - 1]) for x in row]
    return row

df.apply(make_tuple, axis =1)


0   (0.847379908309, 0.462871875315)  (0.897274903359, 0.462871875315)   
1   (0.161202442072, 0.951303842798)  (0.852504052133, 0.951303842798)   
2  (0.0935742441563, 0.986475692614)  (0.503927404884, 0.986475692614)   
                                  2  
0  (0.462871875315, 0.462871875315)  
1  (0.951303842798, 0.951303842798)  
2  (0.986475692614, 0.986475692614)  

Upvotes: 0

Related Questions