Reputation: 419
This might already be answered in stack and I just don't know the best way to phrase this question. But I'm trying to go through a column (data["Id"]) in a DataFrame (data) that contains incomplete strings and replace them with completed versions that I have in a list.
I can't make the switch outright by just reassigning the column values with the list values because the values in the column are randomly ordered and they are connected to other column values in the dataframe that are important.
I tried doing this:
for img_name in images_list:
for label in data["Id"]:
if label in img_name:
data["Id"] = data["Id"].replace(label,img_name)
But my list values and column values are quite large (120,000 each) so this method would take forever. I was wondering if anyone knows a better way of going about this problem? I apologize in advance if this question is redundant and would much a appreciate a link that answers this question.
List Example:
["0img1_type1.png","1img1_type2.png","2img1_type3.png"]
data["Id"] Example:
["0img1","1img1","2img1"]
Upvotes: 1
Views: 2720
Reputation: 5464
Based on your example you can use:
df = pd.DataFrame([["0img1","1img1","2img1"]]).T
df.columns = ['id']
l = ["0img1_type1.png","1img1_type2.png","2img1_type3.png"]
l = set(l)
df['id'] = df['id'].apply(lambda x: [i for i in l if x in i][0])
df
It basically retrieves the first value from your list that matches a substring in your column values. For faster lookup, it is better to convert your list
into set
.
Upvotes: 2