brezz
brezz

Reputation: 65

Filtering by Strings in dataframe and adding separate values

given this dataframe is it possible to look for particular strings such as the countries that are located inside the countries list? (For example for the first index in 'Country', it has the word Japan inside it and its corresponding value will be 1). Is it possible to sum up the value that corresponds to each country? (End result: Japan: 1+3=4 USA:2 Europe:4)

countries=["Europe","USA","Japan"]
df=pd.DataFrame={'Employees':[1,2,3,4],
                 'Country':['Japan;Security','USA;Google',"Japan;Sega","Europe;Google"]}
print(df)

Thanks

Upvotes: 2

Views: 55

Answers (2)

Nk03
Nk03

Reputation: 14949

If you wanna use only those values specified in the country list. You can do something like this -

patt = '(' + '|'.join(countries) + ')'
grp = df.Country.str.extract(pat=patt, expand=False).values
new_df = df.groupby(grp).agg({'Employees': sum})

For example, if the initial country list is missing 'JAPAN' -

countries = ["Europe", "USA"]
patt = '(' + '|'.join(countries) + ')'
grp = df.Country.str.extract(pat=patt, expand=False).values
new_df = df.groupby(grp, dropna=False).agg({'Employees': sum}).reset_index().rename(
    columns={'index': 'Country'}).fillna('other')

outptut-

  Country  Employees
0  Europe          4
1     USA          2
2   other          4 # see the change

Upvotes: 2

Code Different
Code Different

Reputation: 93161

Try this:

c = df['Country'].str.split(';', expand=True)[0].to_numpy()
df.groupby(c)['Employees'].sum()

Upvotes: 1

Related Questions