Adarsh Wase
Adarsh Wase

Reputation: 1890

How to replace a list with first element of list in pandas dataframe column?

I have a pandas dataframe df, which look like this:

df = pd.DataFrame({'Name':['Harry', 'Sam', 'Raj', 'Jamie', 'Rupert'],
                   'Country':['USA', "['USA', 'UK', 'India']", "['India', 'USA']", 'Russia', 'China']})

Name           Country

Harry          USA
Sam            ['USA', 'UK', 'India']
Raj            ['India', 'USA']
Jamie          Russia
Rupert         China

Some values in Country column are list, and I want to replace those list with the first element in the list, so that it will look like this:

Name           Country

Harry          USA
Sam            USA
Raj            India
Jamie          Russia
Rupert         China

Upvotes: 4

Views: 1307

Answers (5)

mozway
mozway

Reputation: 260690

As you have strings, you could use a regex here:

df['Country'] = df['Country'].str.extract('((?<=\[["\'])[^"\']*|^[^"\']+$)')

output (as a new column for clarity):

     Name                 Country Country2
0   Harry                     USA      USA
1     Sam  ['USA', 'UK', 'India']      USA
2     Raj        ['India', 'USA']    India
3   Jamie                  Russia   Russia
4  Rupert                   China    China

regex:

(             # start capturing
(?<=\[["\'])  # if preceded by [" or ['
[^"\']*       # get all text until " or '
|             # OR
^[^"\']+$     # get whole string if it doesn't contain " or '
)             # stop capturing

Upvotes: 2

ansev
ansev

Reputation: 30920

If you have string you could use Series.str.strip in order to remove ']' or '[' and then use Series.str.split to convert all rows to list ,after that we could use .str accesor

df['Country'] = df['Country'].str.strip('[|]').str.split(',')\
                             .str[0].str.replace("'", "")


     Name Country
0   Harry     USA
1     Sam     USA
2     Raj   India
3   Jamie  Russia
4  Rupert   China

Upvotes: 1

JAdel
JAdel

Reputation: 1616

A regex solution.

import re

tempArr = []
for val in df["Country"]:
    if val.startswith("["): 
        val = re.findall(r"[A-Za-z]+",val)[0]
        tempArr.append(val)
    else: tempArr.append(val)

df["Country"] = tempArr

df

     Name Country
0   Harry     USA
1     Sam     USA
2     Raj   India
3   Jamie  Russia
4  Rupert   China

Upvotes: 1

user17242583
user17242583

Reputation:

Try this:

import ast
df['Country'] = df['Country'].where(df['Country'].str.contains('[', regex=False), '[\'' + df['Country'] + '\']').apply(ast.literal_eval).str[0]

Output:

>>> df
     Name Country
0   Harry     USA
1     Sam     USA
2     Raj   India
3   Jamie  Russia
4  Rupert   China

Upvotes: 1

TheFaultInOurStars
TheFaultInOurStars

Reputation: 3608

Try something like:

import ast
def changeStringList(value):
  try:
    myList = ast.literal_eval(value)
    return myList[0]
  except:
    return value
df["Country"] = df["Country"].apply(changeStringList)
df

Output

Name Country
0 Harry USA
1 Sam USA
2 Raj India
3 Jamie Russia
4 Rupert China

Note that, by using the changeStringList function, we try to reform the string list to an interpretable list of strings and return the first value. If it is not a list, then it returns the value itself.

Upvotes: 1

Related Questions