BKS
BKS

Reputation: 2333

How to replace strings with a specific value to another string in a column in pandas dataframe

I have a dataframe

AID   Name      Country
1     XX-USA    FL, USA
2     YY        USA-USA
3     YY-USA    UK

I want to replace instances in Country only that include the string 'USA' for instance into 'USA' alone.

So the result would be

AID   Name      Country
1     XX-USA    USA
2     YY        USA
3     YY-USA    UK

I tried df.replace but couldn't figure out how to do it.

Upvotes: 1

Views: 58

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

UPDATE: I don't know what am i doing "wrong", but it works just fine for me:

In [83]: fn = r'D:\download\countries.csv'

In [84]: Alldf = pd.read_csv(fn, sep=',')

In [85]: pd.options.display.max_rows = 10

In [86]: Alldf
Out[86]:
            aid             name   country
0      79533B41   john sarracino       USA
1      7E0706FD   ben wiedermann       USA
2      7B33445B     rishit sheth       USA
3      7F4CE233       yijun zhao       USA
4      262087DD     roni khardon       USA
...         ...              ...       ...
13387  7C62148F    marcel kutsch       USA
13388  7F42F95A     john z zhang    Canada
13389  7DAF3AED     chris sanden    Canada
13390  00375817  michal laclavik  Slovakia
13391  13D9B371     marek ciglan  Slovakia

[13392 rows x 3 columns]

In [87]: Alldf.loc[Alldf.country.str.contains('USA'), 'country'] = 'USA'

In [88]: Alldf.country.value_counts()
Out[88]:
USA                    6150
China                  1426
Singapore               517
UK                      448
Germany                 398
                       ...
New York Unversity        1
Berkeley and QUT          1
CA 94022                  1
WA 98109                  1
and Google research       1
Name: country, Length: 377, dtype: int64

In [89]: Alldf.loc[Alldf.country.str.contains('USA'), 'country'].unique()
Out[89]: array(['USA'], dtype=object)

Old answer:

In [74]: d
Out[74]:
   AID    Name  Country
0    1  XX-USA  FL, USA
1    2      YY  USA-USA
2    3  YY-USA       UK

In [75]: d.loc[d.Country.str.contains('USA'), 'Country'] = 'USA'

In [76]: d
Out[76]:
   AID    Name Country
0    1  XX-USA     USA
1    2      YY     USA
2    3  YY-USA      UK

Upvotes: 2

Related Questions