user12809368
user12809368

Reputation:

Adding new column with condition

I would need to manage a data frame by adding more columns. My sample of data headers is

`Date` `Sentence` 
28 Jan      who.c   
30 Jan      house.a
02 Feb      eurolet.it

I would need to add another column, Tp, that for each link assigns a value:

I wrote the following:

conditions = [df['Sentence'].str.endswith(original), df['Sentence'].str.endswith(country)]
choices = [original, country]
# df['Tp'] = df.apply(lambda row: urlparse(row['Sentence']).netloc, axis = 1)
df['Tp'] = np.select(conditions, choices, default ='Unknown')
print(df)

where

original= [('a', 'apartment'), ('b', 'bungalow'), ('c', 'church')]

and

country = [('UK', 'United Kingdom'), ('IT', 'Italy'), ('DE', 'Germany'), ('H', 'Holland'), ..., ('F', 'France'), ('S', 'Spain')]

country contains more than 50 elements.

Could you tell me how to fix it? The column should be added in the data frame, then to a csv file.

Thanks

Update:

                      Sentences  \
    0                                      
    1                       who.c  
    2                  citta.me.it   
    3                    office.of
    4                   eurolet.eu   
    ..                               ...   
    995                    uilpa.ie   
    996                      fog.de

Original and country are from

list_country=np.array(country).tolist()
list_country_name=np.array(country_name).tolist()
flat_name_country = [item for sublist in list_country for item in sublist]
flat_country_name = [item for sublist in list_country_name for item in sublist] 

zip_domains=list(zip(flat_name_country, flat_country_name))

Upvotes: 1

Views: 81

Answers (2)

Umar.H
Umar.H

Reputation: 23099

First, lets make some dictionaries from your tuples and combine them

country = {k.lower() : v for (k,v) in country}
og = {k : v for (k,v) in original}
country.update(og)

print(country)

{'uk': 'United Kingdom',
 'it': 'Italy',
 'de': 'Germany',
 'h': 'Holland',
 'f': 'France',
 's': 'Spain',
 'a': 'apartment',
 'b': 'bungalow',
 'c': 'church'}

then lets split and get the max element - this allows for any full stops in your text to be ignored, only looking at the final element. finally, we use .map to associate your values.

df['value'] = df["Sentence"].str.split(".", expand=True).stack().reset_index(1).query(
    "level_1 == level_1.max()"
)[0].map(country)

print(df)

     Date    Sentence      value
0  28 Jan       who.c     church
1  30 Jan     house.a  apartment
2  02 Feb  eurolet.it      Italy

Upvotes: 0

Spandan Brahmbhatt
Spandan Brahmbhatt

Reputation: 4044

Can you convert your original and country into dict ?

original= [('a', 'apartment'), ('b', 'bungalow'), ('c', 'church')]
original = {x:y for x,y in original}
country = [('UK', 'United Kingdom'), ('IT', 'Italy'), ('DE', 'Germany'), ('H', 'Holland'), ..., ('F', 'France'), ('S', 'Spain')]
country = {x:y for x,y in country}

Now you can perform the same task as :

df['Tp'] = df['Sentence'].apply(lambda sen : original.get( sen[-1], country.get(sen[-1], 'unknown') ) )

In your code, you need to have the length of elements in conditions to be same as in choices (and by extension original and country)

Upvotes: 1

Related Questions