Reputation: 93
I have a dataframe and I need to count the word length from the column Word
for each Concept
separately depending on the Note
column.
For each Concept in a df:
if Note contains ("tupi") -> count word length for these Words.
if not -> count word length for others
print (Concept + " tupi " + word_length)
print (Concept + " not tupi " + word_length)
And the output should be something like:
ANTEATER tupi 5.034
ANTEATER not tupi 4.56
_______
WILD CAT tupi 4.55
WILD CAT not tupi 3.44
Input dataframe example:
Language | Concept | Word | Borrowing | Note |
---|---|---|---|---|
First | ANTEATER | tamanduá | YES | loan from tupi |
Second | ANTEATER | uãiarú | ||
Third | ANTEATER | atãn | ||
Fourth | ANTEATER | aatãm | YES | loan from tupi |
Fifth | WILD CAT | tamano | YES | |
Sixth | WILD CAT | sdfsg | YES | |
Seventh | WILD CAT | tamano | YES | loan from tupi |
Eigth | WILD CAT | sdfsg | YES | loan from tupi |
Upvotes: 1
Views: 92
Reputation: 28322
You can do this entirely in pandas without the need for a for-loop.
tupi
that represents if the Note
column contains 'tupi' or not.Word Length
column with the length of the word in the Word
column.Now, use groupby
and compute the average word length of each Concept
with and without 'tupi' in the Note
column:
df['tupi'] = df['Note'].str.contains('tupi').fillna(False)
df['Word Length'] = df['Word'].str.len()
df.groupby(['Concept', 'tupi'])['Word Length'].mean()
Resulting dataframe from the given data:
Concept tupi
ANTEATER False 5.0
True 6.5
WILD CAT False 5.5
True 5.5
Upvotes: 2