Reputation: 1
i have a dataframe with a column called details that have this data :
130 m² - 3 Pièces - 2 Chambres - 2 Salles de bains - Bon état - 20-30 ans -
when i want to get the first data 130 i did this :
df['superficie'] = df['details'].str.split('m²').str[0]
its gives me 130 in a new column that called "superficie"
for the the seconde data i did this :
df['nbPieces']= (df['details'].str.split('-').str[1].str.split('Pièces').str[0])
it gives me 3 in a new column that called "nbPieces"
but my problème is if i want to get the 2 of the champbres and 2 of the salles de bains and the 20-30 near of "ans" , how can i do that, i need to add them to new columns (nbChambre , nbSalleDeBain, NbAnnee)
thanks in advance .
Upvotes: 0
Views: 112
Reputation: 518
I suggest you to use regular expressions in pandas for this kind of operations:
import pandas as pd
df = pd.DataFrame()
df['details'] = ["130 m² - 3 Pièces - 2 Chambres - 2 Salles de bains - Bon état - 20-30 ans -"]
df['nb_chbr'] = df['details'].str.split(" - ").str[2].str.findall(r'\d+').str[0].astype('int64')
df['nb_sdb'] = df['details'].str.split(" - ").str[3].str.findall(r'\d+').str[0].astype('int64')
df['nb_annee'] = df['details'].str.split(" - ").str[5].str.findall(r'\d+').str[0].astype('int64')
print(df)
Output:
details nb_chbr nb_sdb nb_annee
0 130 m² - 3 Pièces - 2 Chambres - 2 Salles de b... 2 2 20
Moreover, I used " - "
as a split string. It returns a better list in your case. And for the "Nombre d'années" case I simply took the first integer that appears in the list, I don't know if it suits you.
Finally there may be a problem in your dataframe, 2 chambres and 2 salles de bain should be a 4 pièces flat ^^
Upvotes: 1