Witness1123
Witness1123

Reputation: 93

Word length count with for/if loop pandas

I have a dataframe and I need to count the word length from the column Word for each Concept separately depending on the Note column.

For each Concept in a df: 
  if Note contains ("tupi") -> count word length for these Words.    
  if not -> count word length for others

  print (Concept + " tupi " + word_length)
  print (Concept + " not tupi " + word_length)

And the output should be something like:

ANTEATER tupi 5.034

ANTEATER not tupi 4.56
_______
WILD CAT tupi 4.55

WILD CAT not tupi 3.44

Input dataframe example:

Language Concept Word Borrowing Note
First ANTEATER tamanduá YES loan from tupi
Second ANTEATER uãiarú
Third ANTEATER atãn
Fourth ANTEATER aatãm YES loan from tupi
Fifth WILD CAT tamano YES
Sixth WILD CAT sdfsg YES
Seventh WILD CAT tamano YES loan from tupi
Eigth WILD CAT sdfsg YES loan from tupi

Upvotes: 1

Views: 92

Answers (1)

Shaido
Shaido

Reputation: 28322

You can do this entirely in pandas without the need for a for-loop.

  • Create a column tupi that represents if the Note column contains 'tupi' or not.
  • Create a Word Length column with the length of the word in the Word column.

Now, use groupby and compute the average word length of each Concept with and without 'tupi' in the Note column:

df['tupi'] = df['Note'].str.contains('tupi').fillna(False)
df['Word Length'] = df['Word'].str.len()
df.groupby(['Concept', 'tupi'])['Word Length'].mean()

Resulting dataframe from the given data:

Concept   tupi 
ANTEATER  False    5.0
          True     6.5
WILD CAT  False    5.5
          True     5.5

Upvotes: 2

Related Questions