Sohan Das
Sohan Das

Reputation: 1620

Match string from column and pick preselected columns

I have a dataset and i want to take rows if a value match in code column and also take only preselected columns, maybe i'm not clear what i need please see the code below.

This is my main dataframe:

        code   des   price   year   brand test test1 test2
0      code1  des1  price1  year1  brand1   xx    xx    xx
1      code2  des2  price2  year2  brand2   xx    xx    xx
2      code3  des3  price3  year3  brand3   xx    xx    xx
3      code4  des4  price4  year4  brand4   xx    xx    xx
4      code5  des5  price5  year5  brand5   xx    xx    xx
5      code6  des6  price6  year6  brand6   xx    xx    xx
6    code114    xx      xx     xx      xx   xx   xxx    xx
7    code274    xx      xx     xx      xx   xx    xx    xx
8     code43    xx      xx     xx      xx   xx    xx    xx
9    code478    xx      xx     xx      xx   xx   xxx    xx
10   code565    xx      xx     xx      xx   xx    xx    xx
11  code6789    xx      xx     xx      xx   xx   xxx    xx

These are the codes that i want to match and get that rows from main dataframe:

['code1','code2','code3','code4','code114','code478','code6789']

These are the colums that i want to get only:

['des','price','brand','test1']

This is my empty dataset where i want to keep the data after extract from main dataframe:

    code des price year brand size test test1 test2
0

This is what i want as output:

       code   des   price year   brand size test test1 test2
0     code1  des1  price1       brand1              xx
1     code2  des2  price2       brand2              xx
2     code3  des3  price3       brand3              xx
3     code4  des4  price4       brand4              xx
4   code114    xx      xx           xx             xxx
5   code478    xx      xx           xx             xxx
6  code6789    xx      xx           xx             xxx

Here is the code i wrote so far, but can't understand how can i solve this, if someone help that would be great, thanks in advance.

import pandas as pd
import numpy as np



code = ['code1','code2','code3','code4','code114','code478','code6789']

columns = ['des','price','brand','test1']


source_data = {
'code': ['code1','code2','code3','code4','code5','code6','code114','code274','code43','code478','code565','code6789'],
'des': ['des1','des2','des3','des4','des5','des6','xx','xx','xx','xx','xx','xx'],
'price': ['price1','price2','price3','price4','price5','price6','xx','xx','xx','xx','xx','xx'],
'year': ['year1','year2','year3','year4','year5','year6','xx','xx','xx','xx','xx','xx'],
'brand': ['brand1','brand2','brand3','brand4','brand5','brand6','xx','xx','xx','xx','xx','xx'],
'test' : ['xx','xx','xx','xx','xx','xx','xx','xx','xx','xx','xx','xx'],
'test1' : ['xx','xx','xx','xx','xx','xx','xxx','xx','xx','xxx','xx','xxx'],
'test2' : ['xx','xx','xx','xx','xx','xx','xx','xx','xx','xx','xx','xx'],
}


destination_data = {
'code': [''],
'des': [''],
'price': [''],
'year': [''],
'brand': [''],
'size': [''],
'test' : [''],
'test1' : [''],
'test2' : [''],
}


df = pd.DataFrame.from_dict(destination_data)
print(df)

Upvotes: 0

Views: 37

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195543

Try:

to_match = ['code1','code2','code3','code4','code114','code478','code6789']
cols = ['code', 'des','price','brand','test1']

out = df.loc[df.code.isin(to_match), cols]
out = out.reindex(df.columns, axis=1).fillna('')
print(out)

Prints:

        code   des   price year   brand test test1 test2
0      code1  des1  price1       brand1         xx      
1      code2  des2  price2       brand2         xx      
2      code3  des3  price3       brand3         xx      
3      code4  des4  price4       brand4         xx      
6    code114    xx      xx           xx        xxx      
9    code478    xx      xx           xx        xxx      

EDIT: Without the empty columns:

to_match = ['code1','code2','code3','code4','code114','code478','code6789']
cols = ['code', 'des','price','brand','test1']

out = df.loc[df.code.isin(to_match), cols]
print(out)

Prints:

        code   des   price   brand test1
0      code1  des1  price1  brand1    xx
1      code2  des2  price2  brand2    xx
2      code3  des3  price3  brand3    xx
3      code4  des4  price4  brand4    xx
6    code114    xx      xx      xx   xxx
9    code478    xx      xx      xx   xxx
11  code6789    xx      xx      xx   xxx

Upvotes: 1

ThePyGuy
ThePyGuy

Reputation: 18426

Just create the masking using isin method of the pandas Series, and take only the columns you want.

df[df['code'].isin(['code1','code2','code3','code4','code114','code478','code6789'])][['des','price','brand','test1']]

OUTPUT:

     des   price   brand test1
0   des1  price1  brand1    xx
1   des2  price2  brand2    xx
2   des3  price3  brand3    xx
3   des4  price4  brand4    xx
6     xx      xx      xx   xxx
9     xx      xx      xx   xxx
11    xx      xx      xx   xxx

Upvotes: 0

Related Questions