Ashish Kumar
Ashish Kumar

Reputation: 233

Pandas dataframe Split One column data into 2 using some condition

I have one dataframe which is below-

             0  
    ____________________________________
0     Country| India  
60        Delhi  
62       Mumbai  
68       Chennai  
75    Country| Italy  
78        Rome  
80       Venice  
85        Milan  
88    Country| Australia  
100      Sydney  
103      Melbourne  
107      Perth  

I want to Split the data in 2 columns so that in one column there will be country and on other there will be city. I have no idea where to start with. I want like below-

             0                    1
    ____________________________________
0     Country| India           Delhi
1     Country| India           Mumbai
2     Country| India           Chennai         
3    Country| Italy           Rome
4    Country| Italy           Venice   
5    Country| Italy           Milan        
6    Country| Australia       Sydney
7   Country| Australia       Melbourne
8   Country| Australia       Perth     

Any Idea how to do this?

Upvotes: 3

Views: 102

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

Look for rows where | is present and pull into another column, and fill down on the newly created column :

(
    df.rename(columns={"0": "city"})
    # this looks for rows that contain '|' and puts them into a 
    # new column called Country. rows that do not match will be
    # null in the new column.
    .assign(Country=lambda x: x.loc[x.city.str.contains("\|"), "city"])
    # fill down on the Country column, this also has the benefit
    # of linking the Country with the City, 
    .ffill()
    # here we get rid of duplicate Country entries in city and Country
    # this ensures that only Country entries are in the Country column
    # and cities are in the City column
    .query("city != Country")
    # here we reverse the column positions to match your expected output 
    .iloc[:, ::-1]
)


      Country           city
60  Country| India      Delhi
62  Country| India      Mumbai
68  Country| India      Chennai
78  Country| Italy      Rome
80  Country| Italy      Venice
85  Country| Italy      Milan
100 Country| Australia  Sydney
103 Country| Australia  Melbourne
107 Country| Australia  Perth

Upvotes: 3

jezrael
jezrael

Reputation: 862641

Use DataFrame.insert with Series.where and Series.str.startswith for replace not matched values to missing values with ffill for forward filling missing values and then remove rows with same values in both by Series.ne for not equal in boolean indexing:

df.insert(0, 'country', df[0].where(df[0].str.startswith('Country')).ffill())
df = df[df['country'].ne(df[0])].reset_index(drop=True).rename(columns={0:'city'})
print (df)
             country       city
0      Country|India      Delhi
1      Country|India     Mumbai
2      Country|India    Chennai
3      Country|Italy       Rome
4      Country|Italy     Venice
5      Country|Italy      Milan
6  Country|Australia     Sydney
7  Country|Australia  Melbourne
8  Country|Australia      Perth

Upvotes: 2

Related Questions