jimmy
jimmy

Reputation: 350

Combine two columns with different data types Pandas

I have the following dataframe:

id  variable      quantitative_v   cualitative_v
a      gdp             1          Nan
a      exports         12         Nan
a      imports         43         Nan
a      category        Nan        A
a      developed       Nan        Yes
b      gdp             1          Nan
b      exports         12         Nan
b      imports         43         Nan
b      category        Nan        A
b      developed       Nan        Yes

The df I expect:

id  variable     value  
a      gdp         1    
a      exports     12   
a      imports     43   
a      category    A
a      developed   Yes
b      gdp         1
b      exports     12 
b      imports     43 
b      category    A
b      developed   Yes

How could I do that? I tried to sum, but that didn't work because they're different type of data

Upvotes: 0

Views: 1370

Answers (2)

Rob Raymond
Rob Raymond

Reputation: 31146

Your data is structured such that NaN is showing in one column or the other. Better to update one column from the other.

df = pd.DataFrame({"id":["a","a","a","a","a","b","b","b","b","b"],"variable":["gdp","exports","imports","category","developed","gdp","exports","imports","category","developed"],"quantitative_v":[1,12,43,"Nan"," Nan",1,12,43,"Nan"," Nan"],"cualitative_v":["Nan"," Nan"," Nan","A","Yes","Nan"," Nan"," Nan","A","Yes"]})
df = df.replace({"Nan":np.nan, " Nan":np.nan}).reset_index()
mask = df["quantitative_v"].isna()
df.loc[mask,("quantitative_v")] = df[mask]["cualitative_v"]
df = df.drop("cualitative_v",1).rename({"quantitative_v":"value"},axis=1)
print(df.to_string(index=False))

output

 index id   variable value
     0  a        gdp     1
     1  a    exports    12
     2  a    imports    43
     3  a   category     A
     4  a  developed   Yes
     5  b        gdp     1
     6  b    exports    12
     7  b    imports    43
     8  b   category     A
     9  b  developed   Yes

Upvotes: 0

Space Impact
Space Impact

Reputation: 13255

Use fillna or combine_first:

df['quantitative_v'] = df['quantitative_v'].astype(float)
df['quantitative_v'].fillna(df['cualitative_v'])

df['quantitative_v'].combine_first(df['cualitative_v'])

Upvotes: 1

Related Questions