shubham kumar
shubham kumar

Reputation: 45

how to insert the searched string into new row in pandas

I have an Excel containing a specific set of data for Example :

Sno String
1   ram has a ball
2   karan has a ball
3   raj has a ball
4   seema has a ball
5   raj has a ball   raj

From the above df i want to extract specific set of names and write them to new column

Sno String           names  count
1   ram has a ball   ram    1
2   karan has a ball karan  2
3   raj has a ball   raj    3
4   seema has a ball
5   raj has a ball   raj    4

The Code Which i am using :

import pandas as pd
import re
df = pd.read_excel (r'<path>')

array = {'ram','karan','raj'}
count = 0
for index, row in df.iterrows():
   sql = row["string"]
   for i in array:
       str = re.findall(i,sql)
       if str:
           count = count + 1
       else :
           continue
       df["name"] = pd.Series(str)
       df["count"] = pd.Series(count)
       
path_to_file = r'<path>'
df.to_excel(path_to_file)  

Code is working but it's iterating over only a single row instead of writing the row value for each iteration

The Output which i am getting :

Sno String           names  count
1   ram has a ball   ram    1
2   karan has a ball 
3   raj has a ball   
4   seema has a ball
5   raj has a ball   

Can anyone help me with my problem ?

Upvotes: 1

Views: 50

Answers (3)

Da  Song
Da Song

Reputation: 558

problem is these two lines:

df["name"] = pd.Series(str)
df["count"] = pd.Series(count)

instead putting them into the loop, it is better that you init two lists before for loop, and append to list within loop, and insert to df after the loop, something like this:

# before the loop
name_list = []
count_list = []
for index, row in df.iterrows():
    # do somthing and get current name and count
    name_list.append(name)
    count_list.append(count)

# add series to df
df['name'] = pd.Series(name_list)
df['count'] = pd.Series(count_list)

iterating rows like this can be slow, if you need a faster solution, think about vectorizing the operations mentioned in other answers

Upvotes: 1

mozway
mozway

Reputation: 260780

You can use pandas.Series.str.findall:

df['String'].str.findall('|'.join(array)).map(lambda x: x[0] if x else '')

output:

0      ram
1    karan
2      raj
3         
4      raj

full code:

df['names'] = df['String'].str.findall('|'.join(array)).map(lambda x: x[0] if x else '')
df['count'] = (~df['names'].eq('')).cumsum().mask(df['names'].eq(''), '')

output:

   Sno                String  names count
0    1        ram has a ball    ram     1
1    2      karan has a ball  karan     2
2    3        raj has a ball    raj     3
3    4      seema has a ball             
4    5  raj has a ball   raj    raj     4

Upvotes: 1

BENY
BENY

Reputation: 323276

We can do str.findall from pandas

df['names'] = df.String.str.findall('|'.join(array)).str[0]
df['cnt'] = df.names.notna().cumsum().mask(df.names.isna())
df
Out[176]: 
              String  names  cnt
0     ram has a ball    ram  1.0
1   karan has a ball  karan  2.0
2     raj has a ball    raj  3.0
3   seema has a ball    NaN  NaN
4  raj has a ball       raj  4.0

Upvotes: 1

Related Questions