Soumyaansh
Soumyaansh

Reputation: 8978

Python: How to split a string column in a dataframe?

I have a dataframe with two columns one is Date and the other one is Location(Object) datatype, below is the format of Location columns with values :

 Date                                            Location
1     07/12/1912                            AtlantiCity, New Jersey   
2     08/06/1913                 Victoria, British Columbia, Canada   
3     09/09/1913                                 Over the North Sea   
4     10/17/1913                         Near Johannisthal, Germany   
5     03/05/1915                                    Tienen, Belgium   
6     09/03/1915                              Off Cuxhaven, Germany   
7     07/28/1916                              Near Jambol, Bulgeria   
8     09/24/1916                                Billericay, England   
9     10/01/1916                               Potters Bar, England   
10    11/21/1916                                     Mainz, Germany

my requirement is to split the Location by "," separator and keep only the second part of it (ex. New Jersey, Canada, Germany, England etc..) in the Location column. I also have to check if its only a single element (values with single element having no ",")

Is there a way I can do it with predefined method without looping each and every row ?

Sorry if the question is off the standard as I am new to Python and still learning.

Upvotes: 3

Views: 1729

Answers (2)

akrun
akrun

Reputation: 886948

We could try with str.extract

print(df['Location'].str.extract(r'([^,]+$)'))    
#0            New Jersey
#1                Canada
#2    Over the North Sea
#3               Germany
#4              Belgium 
#5               Germany
#6              Bulgeria
#7               England
#8               England
#9               Germany

Upvotes: 1

akuiper
akuiper

Reputation: 214927

A straight forward way is to apply the split method to each element of the column and pick up the last one:

df.Location.apply(lambda x: x.split(",")[-1])

1             New Jersey
2                 Canada
3     Over the North Sea
4                Germany
5                Belgium
6                Germany
7               Bulgeria
8                England
9                England
10               Germany
Name: Location, dtype: object

To check if each cell has only one element we can use str.contains method on the column:

df.Location.str.contains(",")

1      True
2      True
3     False
4      True
5      True
6      True
7      True
8      True
9      True
10     True
Name: Location, dtype: bool

Upvotes: 3

Related Questions