Reputation: 1620
I have a dataset and i want to take rows if a value match in code column and also take only preselected columns, maybe i'm not clear what i need please see the code below.
This is my main dataframe:
code des price year brand test test1 test2
0 code1 des1 price1 year1 brand1 xx xx xx
1 code2 des2 price2 year2 brand2 xx xx xx
2 code3 des3 price3 year3 brand3 xx xx xx
3 code4 des4 price4 year4 brand4 xx xx xx
4 code5 des5 price5 year5 brand5 xx xx xx
5 code6 des6 price6 year6 brand6 xx xx xx
6 code114 xx xx xx xx xx xxx xx
7 code274 xx xx xx xx xx xx xx
8 code43 xx xx xx xx xx xx xx
9 code478 xx xx xx xx xx xxx xx
10 code565 xx xx xx xx xx xx xx
11 code6789 xx xx xx xx xx xxx xx
These are the codes that i want to match and get that rows from main dataframe:
['code1','code2','code3','code4','code114','code478','code6789']
These are the colums that i want to get only:
['des','price','brand','test1']
This is my empty dataset where i want to keep the data after extract from main dataframe:
code des price year brand size test test1 test2
0
This is what i want as output:
code des price year brand size test test1 test2
0 code1 des1 price1 brand1 xx
1 code2 des2 price2 brand2 xx
2 code3 des3 price3 brand3 xx
3 code4 des4 price4 brand4 xx
4 code114 xx xx xx xxx
5 code478 xx xx xx xxx
6 code6789 xx xx xx xxx
Here is the code i wrote so far, but can't understand how can i solve this, if someone help that would be great, thanks in advance.
import pandas as pd
import numpy as np
code = ['code1','code2','code3','code4','code114','code478','code6789']
columns = ['des','price','brand','test1']
source_data = {
'code': ['code1','code2','code3','code4','code5','code6','code114','code274','code43','code478','code565','code6789'],
'des': ['des1','des2','des3','des4','des5','des6','xx','xx','xx','xx','xx','xx'],
'price': ['price1','price2','price3','price4','price5','price6','xx','xx','xx','xx','xx','xx'],
'year': ['year1','year2','year3','year4','year5','year6','xx','xx','xx','xx','xx','xx'],
'brand': ['brand1','brand2','brand3','brand4','brand5','brand6','xx','xx','xx','xx','xx','xx'],
'test' : ['xx','xx','xx','xx','xx','xx','xx','xx','xx','xx','xx','xx'],
'test1' : ['xx','xx','xx','xx','xx','xx','xxx','xx','xx','xxx','xx','xxx'],
'test2' : ['xx','xx','xx','xx','xx','xx','xx','xx','xx','xx','xx','xx'],
}
destination_data = {
'code': [''],
'des': [''],
'price': [''],
'year': [''],
'brand': [''],
'size': [''],
'test' : [''],
'test1' : [''],
'test2' : [''],
}
df = pd.DataFrame.from_dict(destination_data)
print(df)
Upvotes: 0
Views: 37
Reputation: 195543
Try:
to_match = ['code1','code2','code3','code4','code114','code478','code6789']
cols = ['code', 'des','price','brand','test1']
out = df.loc[df.code.isin(to_match), cols]
out = out.reindex(df.columns, axis=1).fillna('')
print(out)
Prints:
code des price year brand test test1 test2
0 code1 des1 price1 brand1 xx
1 code2 des2 price2 brand2 xx
2 code3 des3 price3 brand3 xx
3 code4 des4 price4 brand4 xx
6 code114 xx xx xx xxx
9 code478 xx xx xx xxx
EDIT: Without the empty columns:
to_match = ['code1','code2','code3','code4','code114','code478','code6789']
cols = ['code', 'des','price','brand','test1']
out = df.loc[df.code.isin(to_match), cols]
print(out)
Prints:
code des price brand test1
0 code1 des1 price1 brand1 xx
1 code2 des2 price2 brand2 xx
2 code3 des3 price3 brand3 xx
3 code4 des4 price4 brand4 xx
6 code114 xx xx xx xxx
9 code478 xx xx xx xxx
11 code6789 xx xx xx xxx
Upvotes: 1
Reputation: 18426
Just create the masking using isin
method of the pandas Series, and take only the columns you want.
df[df['code'].isin(['code1','code2','code3','code4','code114','code478','code6789'])][['des','price','brand','test1']]
OUTPUT:
des price brand test1
0 des1 price1 brand1 xx
1 des2 price2 brand2 xx
2 des3 price3 brand3 xx
3 des4 price4 brand4 xx
6 xx xx xx xxx
9 xx xx xx xxx
11 xx xx xx xxx
Upvotes: 0