Passing string variable value in Pandas dataframe

Question

I have been trying to use variables for passing the string value in dataframe for various column operations, but the code is giving me wrong results. See the code below, I am using in Jupyter Notebook:

first_key = input("key 1: ")
second_key = input("ket 2: ")
third_key = input("ket 2: ")

These receive the values "Russia", "China", "Trump" for the operation in next cell as below:

tweets['{first_key}'] = tweets['text'].str.contains(r"^(?=.*\b{first_key}\b).*$", case=False) == True
tweets['{second_key}'] = tweets['text'].str.contains(r"^(?=.*\b'{second_key}'\b).*$", case=False) == True
tweets['{third_key}'] = tweets['text'].str.contains(r"^(?=.*\b'{third_key}'\b).*$", case=False) == True

But results are wrong. Any idea how to get the correct results. A small snapshot of the results is like this.

cs95 · Accepted Answer

I've tried cleaning up your code. You can leverage f-strings (using python-3.6+) with a tiny change to your code:

def contains(series, key):
    return series.str.contains(rf"^(?=.*\b{key}\b).*$", case=False)

If you're working with an older version of python, use str.format:

def contains(series, key):
    return series.str.contains(r"^(?=.*\b{}\b).*$".format(key), case=False)

Next, call this function inside a loop:

for key in (first_key, second_key, third_key):
    tweets[key] = contains(tweets['text'], key)

Passing string variable value in Pandas dataframe

Answers (1)

Related Questions