comparing lenght of specific column rows in dataframe, python

Question

Input:

DF1:
name, message
adam, hello, i'am
viola, hi, my name is

data:
name, message
adam, hello, i'am
viola, hi, my name

I want to compare, if length of messages for specific name (for example: adam and adam) are same pass, else print this row.

Code:

if df['message'].apply(lambda x: len(x)) == data['name'].apply(lambda x: len(x)):
    pass
else:
    df['message'].apply(lambda x: print(x)) 
    #edit: i can use maybe df.loc[:,'message'] as well i think

But I am receiving: TypeError: object of type 'float' has no len(), why?

sg.sysel · Accepted Answer

There might be a better way, but this could work for you:

import pandas
dt = pandas.DataFrame([["Adam","Hello, I am Adam"], ["Viola", "How are you"]], columns=["name", "message"])
data = pandas.DataFrame([["Adam","Hello, I am Adam"], ["Viola", "How are ya"]], columns=["name", "message"])

print(dt)
print(data)

data.columns = ["name", "message_data"]

merged = dt.merge(data, on=["name"])
merged[merged.message.str.len() != merged.message_data.str.len()]

First, you need to rename the ["message"] column, so that it doesn't clash in the merge. Then you merge both dataframes, keeping only names which exist in both dataframes. Finaly, you compare the lengths of the strings in ["message"] with those in ["message_data"] and use that to extract those rows of the merged table that are different.

If you specifically want only the message, you can do:

merged.loc[merged.message.str.len() != merged.message_data.str.len(), "message"]

Printing the result line-by-line should be straightforward.

comparing lenght of specific column rows in dataframe, python

Answers (2)

Related Questions