Aviral Srivastava
Aviral Srivastava

Reputation: 4582

How to compute hash of all the columns in Pandas Dataframe?

df.apply is a method that can apply a certain function to all the columns in a dataframe, or the required columns. However, my aim is to compute the hash of a string: this string is the concatenation of all the values in a row corresponding to all the columns. My current code is returning NaN.

The current code is:

df["row_hash"] = df["row_hash"].apply(self.hash_string)

The function self.hash_string is:

def hash_string(self, value):
        return (sha1(str(value).encode('utf-8')).hexdigest())

Yes, it would be easier to merge all columns of Pandas dataframe but current answer couldn't help me either.

The file that I am reading is(the first 10 rows):

16012,16013,16014,16015,16016,16017,16018,16019,16020,16021,16022
16013,16014,16015,16016,16017,16018,16019,16020,16021,16022,16023
16014,16015,16016,16017,16018,16019,16020,16021,16022,16023,16024
16015,16016,16017,16018,16019,16020,16021,16022,16023,16024,16025
16016,16017,16018,16019,16020,16021,16022,16023,16024,16025,16026

The col names are: col_test_1, col_test_2, .... , col_test_11

Upvotes: 2

Views: 7524

Answers (2)

Tarifazo
Tarifazo

Reputation: 4343

You can use apply twice, first on the row elements then on the result:

df.apply(lambda x: ''.join(x.astype(str)),axis=1).apply(self.hash_string)

Sidenote: I don't understand why you are defining hash_string as an instance method (instead of a plain function), since it doesn't use the self argument. In case you have problems can just pass it as function:

df.apply(lambda x: ''.join(x.astype(str)),axis=1).apply(lambda value: sha1(str(value).encode('utf-8')).hexdigest())

Upvotes: 3

vital_dml
vital_dml

Reputation: 1266

You can create a new column, which is concatenation of all others:

df['new'] = df.astype(str).values.sum(axis=1)

And then apply your hash function on it

df["row_hash"] = df["new"].apply(self.hash_string)

or this one-row should work:

df["row_hash"] = df.astype(str).values.sum(axis=1).apply(hash_string)

However, not sure if you need a separate function here, so:

 df["row_hash"] = df.astype(str).values.sum(axis=1).apply(lambda x: sha1(str(x).encode('utf-8')).hexdigest())

Upvotes: 4

Related Questions