Ajay Kumar
Ajay Kumar

Reputation: 21

Strip an Item to a new column from existing column using python

New to python. I want to strip name from a DF column. I am looping through every item in the list in the column but what's the acceptable way to do this on a large size CSV.

Input DF:

    Entity                                     Details
0  Entity1         street=First,name=John,postcode=123
1  Entity2                                  name=Billy
2  Entity3  street=Second,interest=walking,name=Julian
3  Entity4                                          

Code:

df = pd.DataFrame(
  {
    "Entity": ["Entity1","Entity2","Entity3","Entity4"],
    "Details": ["street=First,name=John,postcode=123",
                "name=Billy",
                "street=Second,interest=walking,name=Julian",
                ""
               ]
  }
)

print(df)

df["Details"] = df["Details"].str.split(',')

lam = (lambda details: list(each_detail for each_detail in details if each_detail.startswith('name') ))

df["Name"] = df["Details"].map(lam)

#Only to change one item list to string, any alternate
df = df.explode("Name")

df["Name"] = df["Name"].str.lstrip('name=')

print(df)

Output DF:

    Entity                                         Details    Name
0  Entity1         [street=First, name=John, postcode=123]    John
1  Entity2                                    [name=Billy]   Billy
2  Entity3  [street=Second, interest=walking, name=Julian]  Julian
3  Entity4                                              []     NaN

Upvotes: 0

Views: 140

Answers (2)

sudhish
sudhish

Reputation: 98

I usually use apply() function in combination with a separate definition to process the data column.

Here below I am using re to split the Details string in columns to extract name value.

import re
def get_name(row):
    it=iter(re.split(',|=',row))
    name_val = [next(it) for x in it if x=='name']
    if name_val:
        return name_val[0]
    else:
        return []
    
df['Name']=df['Details'].apply(lambda row:get_name(row))

Output

Entity      Details                                     Name
0   Entity1 street=First,name=John,postcode=123         John
1   Entity2 name=Billy                                  Billy
2   Entity3 street=Second,interest=walking,name=Julian  Julian
3   Entity4                                             []

Upvotes: 0

Paul Brennan
Paul Brennan

Reputation: 2696

Here is how to use the regex extract to get the name

df['name'] = df['Details'].str.extract(r'name=(\w+)')

this gives

    Entity  Details                                     name
0   Entity1 street=First,name=John,postcode=123         John
1   Entity2 name=Billy                                  Billy
2   Entity3 street=Second,interest=walking,name=Julian  Julian
3   Entity4                                             NaN

Upvotes: 1

Related Questions