Syntax error when lemmatizing column in pandas

Question

I am trying to lemmatize words in a particular column ('body') using pandas.

I have tried the following code, that I found here

import nltk
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer 
lemmatizer = nltk.stem.WordNetLemmatizer()
wordnet_lemmatizer = WordNetLemmatizer()

df['body'] = df['body'].apply(lambda x: "".join([Word(word).lemmatize() for word in 
df['body'].head()

When I attempt to run the code, I get an error message that simply says

File "", line 33
  df['body'] = df['body'].apply(lambda x: "".join([Word(word).lemmatize() for word in x)
   ^
SyntaxError: invalid syntax

I have also tried the solution presented in this post but didn't have any luck.

UPDATE: this is the full code so far

import pandas as pd
import re
import string


df1 = pd.read_csv('RP_text_posts.csv')
df2 = pd.read_csv('RP_text_comments.csv')
# Renaming columns so the post part - currently 'selftext' matches the post variable in the comments - 'body'
df1.columns = ['author','subreddit','score','num_comments','retrieved_on','id','created_utc','body']
# Dropping columns that aren't subreddit or the post content
df1 = df1.drop(columns=['author','score','num_comments','retrieved_on','id','created_utc'])
df2 = df2.drop(labels=None, columns=['author', 'score', 'created_utc'])
# Combining data
df = pd.concat([df1, df2])

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
lemmatizer = nltk.stem.WordNetLemmatizer()
wordnet_lemmatizer = WordNetLemmatizer()
stop = stopwords.words('english')

# Lemmatizing
df['body'] = df['body'].apply(lambda x: "".join([Word(word).lemmatize() for word in x) 
df['body'].head()`

Syntax error when lemmatizing column in pandas

Answers (1)

Related Questions