Reputation: 45
I have an Excel containing a specific set of data for Example :
Sno String
1 ram has a ball
2 karan has a ball
3 raj has a ball
4 seema has a ball
5 raj has a ball raj
From the above df i want to extract specific set of names and write them to new column
Sno String names count
1 ram has a ball ram 1
2 karan has a ball karan 2
3 raj has a ball raj 3
4 seema has a ball
5 raj has a ball raj 4
The Code Which i am using :
import pandas as pd
import re
df = pd.read_excel (r'<path>')
array = {'ram','karan','raj'}
count = 0
for index, row in df.iterrows():
sql = row["string"]
for i in array:
str = re.findall(i,sql)
if str:
count = count + 1
else :
continue
df["name"] = pd.Series(str)
df["count"] = pd.Series(count)
path_to_file = r'<path>'
df.to_excel(path_to_file)
Code is working but it's iterating over only a single row instead of writing the row value for each iteration
The Output which i am getting :
Sno String names count
1 ram has a ball ram 1
2 karan has a ball
3 raj has a ball
4 seema has a ball
5 raj has a ball
Can anyone help me with my problem ?
Upvotes: 1
Views: 50
Reputation: 558
problem is these two lines:
df["name"] = pd.Series(str)
df["count"] = pd.Series(count)
instead putting them into the loop, it is better that you init two lists before for loop, and append to list within loop, and insert to df after the loop, something like this:
# before the loop
name_list = []
count_list = []
for index, row in df.iterrows():
# do somthing and get current name and count
name_list.append(name)
count_list.append(count)
# add series to df
df['name'] = pd.Series(name_list)
df['count'] = pd.Series(count_list)
iterating rows like this can be slow, if you need a faster solution, think about vectorizing the operations mentioned in other answers
Upvotes: 1
Reputation: 260780
You can use pandas.Series.str.findall
:
df['String'].str.findall('|'.join(array)).map(lambda x: x[0] if x else '')
output:
0 ram
1 karan
2 raj
3
4 raj
full code:
df['names'] = df['String'].str.findall('|'.join(array)).map(lambda x: x[0] if x else '')
df['count'] = (~df['names'].eq('')).cumsum().mask(df['names'].eq(''), '')
output:
Sno String names count
0 1 ram has a ball ram 1
1 2 karan has a ball karan 2
2 3 raj has a ball raj 3
3 4 seema has a ball
4 5 raj has a ball raj raj 4
Upvotes: 1
Reputation: 323276
We can do str.findall
from pandas
df['names'] = df.String.str.findall('|'.join(array)).str[0]
df['cnt'] = df.names.notna().cumsum().mask(df.names.isna())
df
Out[176]:
String names cnt
0 ram has a ball ram 1.0
1 karan has a ball karan 2.0
2 raj has a ball raj 3.0
3 seema has a ball NaN NaN
4 raj has a ball raj 4.0
Upvotes: 1