sariii
sariii

Reputation: 2140

how to convert a dataframe to tensor

I have a dataframe like this:

         ids  dim
0         1    2
1         1    0
2         1    1
3         2    1
4         2    2
5         3    0
6         3    2
7         4    1
8         4    2
9         Nan  0
10        Nan  1
11        Nan  0

I want to build a tensorflow tensor out of it so that the result look like this: Here the columns are correspond to the dim column in df, as we have three distinct value (0, 1,2) the equivalent tensor would have three column.

And the values of the tensor are the associated id s in the df.

1   1   1
Nan 2   2
3   Nan 3
Nan 4   4

What I did:

I tried to convert the df to a numpy and then convert it to the tensor, however, the result does not look like what I want:

tf.constant(df[['ids', 'dim']].values, dtype=tf.int32)

Upvotes: 1

Views: 1581

Answers (3)

try torch

item=torch.tensor(df.values)

Upvotes: 0

thushv89
thushv89

Reputation: 11333

You can use pd.pivot_table() for a concise computation

df = pd.DataFrame([[1, 2],
                   [1, 0],
                   [1, 1],
                   [2, 1],
                   [2, 2],
                   [3, 0],
                   [3, 2],
                   [4, 1],
                   [4, 2],
                   [np.nan, 0],
                   [np.nan, 1],
                   [np.nan, 0]], columns=['ids', 'dim'])

df['val'] = 1
df = df.pivot_table(index='ids',columns='dim',values='val') 
df = df.multiply(np.array(df.index), axis=0)

tensor = tf.constant(df)

Upvotes: 1

Davinder Singh
Davinder Singh

Reputation: 2162

Check my code:

import numpy as np
import pandas as pd
import tensorflow as tf


df = pd.DataFrame([[1, 2],
                   [1, 0],
                   [1, 1],
                   [2, 1],
                   [2, 2],
                   [3, 0],
                   [3, 2],
                   [4, 1],
                   [4, 2],
                   [np.nan, 0],
                   [np.nan, 1],
                   [np.nan, 0]], columns=['ids', 'dim'])
dim_array = np.array(df['dim'])
sort = dim_array.argsort()
final = np.array([df.ids[sort]]).reshape((3, 4)).T
final_result = tf.constant(final, dtype=tf.int32) # use tf.float32 to retain nan in tensor
print(final_result)

# <tf.Tensor: shape=(4, 3), dtype=int32, numpy=
# array([[          1,           1,           1],
#        [          3,           2,           2],
#        [-2147483648,           4,           3],
#        [-2147483648, -2147483648,           4]], 
# dtype=int32)>

In tensorflow nan will loss by some value.

Upvotes: 1

Related Questions