Reputation: 21
New to python. I want to strip name from a DF column. I am looping through every item in the list in the column but what's the acceptable way to do this on a large size CSV.
Input DF:
Entity Details
0 Entity1 street=First,name=John,postcode=123
1 Entity2 name=Billy
2 Entity3 street=Second,interest=walking,name=Julian
3 Entity4
Code:
df = pd.DataFrame(
{
"Entity": ["Entity1","Entity2","Entity3","Entity4"],
"Details": ["street=First,name=John,postcode=123",
"name=Billy",
"street=Second,interest=walking,name=Julian",
""
]
}
)
print(df)
df["Details"] = df["Details"].str.split(',')
lam = (lambda details: list(each_detail for each_detail in details if each_detail.startswith('name') ))
df["Name"] = df["Details"].map(lam)
#Only to change one item list to string, any alternate
df = df.explode("Name")
df["Name"] = df["Name"].str.lstrip('name=')
print(df)
Output DF:
Entity Details Name
0 Entity1 [street=First, name=John, postcode=123] John
1 Entity2 [name=Billy] Billy
2 Entity3 [street=Second, interest=walking, name=Julian] Julian
3 Entity4 [] NaN
Upvotes: 0
Views: 140
Reputation: 98
I usually use apply()
function in combination with a separate definition to process the data column.
Here below I am using re
to split the Details
string in columns to extract name
value.
import re
def get_name(row):
it=iter(re.split(',|=',row))
name_val = [next(it) for x in it if x=='name']
if name_val:
return name_val[0]
else:
return []
df['Name']=df['Details'].apply(lambda row:get_name(row))
Output
Entity Details Name
0 Entity1 street=First,name=John,postcode=123 John
1 Entity2 name=Billy Billy
2 Entity3 street=Second,interest=walking,name=Julian Julian
3 Entity4 []
Upvotes: 0
Reputation: 2696
Here is how to use the regex extract to get the name
df['name'] = df['Details'].str.extract(r'name=(\w+)')
this gives
Entity Details name
0 Entity1 street=First,name=John,postcode=123 John
1 Entity2 name=Billy Billy
2 Entity3 street=Second,interest=walking,name=Julian Julian
3 Entity4 NaN
Upvotes: 1