KHALILI Mohammed
KHALILI Mohammed

Reputation: 1

Python pandas 'reverse' split in a dataframe

i have a dataframe with a column called details that have this data :

130 m² - 3 Pièces - 2 Chambres - 2 Salles de bains - Bon état - 20-30 ans -

when i want to get the first data 130 i did this :

df['superficie'] = df['details'].str.split('m²').str[0]

its gives me 130 in a new column that called "superficie"

for the the seconde data i did this :

df['nbPieces']= (df['details'].str.split('-').str[1].str.split('Pièces').str[0])

it gives me 3 in a new column that called "nbPieces"

but my problème is if i want to get the 2 of the champbres and 2 of the salles de bains and the 20-30 near of "ans" , how can i do that, i need to add them to new columns (nbChambre , nbSalleDeBain, NbAnnee)

thanks in advance .

Upvotes: 0

Views: 112

Answers (1)

César Debeunne
César Debeunne

Reputation: 518

I suggest you to use regular expressions in pandas for this kind of operations:

import pandas as pd

df = pd.DataFrame()
df['details'] = ["130 m² - 3 Pièces - 2 Chambres - 2 Salles de bains - Bon état - 20-30 ans -"]

df['nb_chbr'] = df['details'].str.split(" - ").str[2].str.findall(r'\d+').str[0].astype('int64')

df['nb_sdb'] = df['details'].str.split(" - ").str[3].str.findall(r'\d+').str[0].astype('int64')

df['nb_annee'] = df['details'].str.split(" - ").str[5].str.findall(r'\d+').str[0].astype('int64')

print(df)

Output:

details  nb_chbr  nb_sdb  nb_annee
0  130 m² - 3 Pièces - 2 Chambres - 2 Salles de b...        2       2        20

Moreover, I used " - " as a split string. It returns a better list in your case. And for the "Nombre d'années" case I simply took the first integer that appears in the list, I don't know if it suits you.

Finally there may be a problem in your dataframe, 2 chambres and 2 salles de bain should be a 4 pièces flat ^^

Upvotes: 1

Related Questions