Reputation: 13
I have following data:
id date oked_1 oked_2 KPS address type
225 001041004832 2000-10-12 71209 01111 105 196430100 3
225 001041004832 2000-10-12 71209 46211 105 196430100 3
225 001041004832 2000-10-12 71209 52101 105 196430100 3
i need to move "oked_2" to "oked_1" in a way that all other columns have to replicated. For example, below you can see how oked_2 values are copied to oked_1, while other column data are the same. I want to have only oked_1 for my final dataframe(all oked_2 data have to be moved to oked_1).I expect:
id date oked_1 oked_2 KPS address type
225 001041004832 2000-10-12 71209 01111 105 196430100 3
225 001041004832 2000-10-12 01111 46211 105 196430100 3
225 001041004832 2000-10-12 46211 52101 105 196430100 3
225 001041004832 2000-10-12 52101 52101 105 196430100 3
How can I do that? I have not tried, because I do not have any clue how to process it...
If you see the expected dataframe, you can clearly notice that values from oked_2 are copied to oked_1. Furthermore, because one row was added because there was 3 different values in oked_2 and one was in oked_1. Total 4 unique values.
Upvotes: 0
Views: 83
Reputation: 247
from io import StringIO
import pandas as pd
data = """
_ id date oked_1 oked_2 KPS address type
225 001041004832 2000-10-12 71209 01111 105 196430100 3
225 001041004832 2000-10-12 71209 46211 105 196430100 3
225 001041004832 2000-10-12 71209 52101 105 196430100 3
"""
df = pd.read_csv(StringIO(data), dtype=str, delim_whitespace=True)
df['oked_1'] = df[['oked_1', 'oked_2']].to_numpy().tolist()
df = (df.explode('oked_1')
.drop_duplicates('oked_1', ignore_index=True)
.drop('oked_2', axis=1)
)
Output for df
:
_ id date oked_1 KPS address type
0 225 001041004832 2000-10-12 71209 105 196430100 3
1 225 001041004832 2000-10-12 01111 105 196430100 3
2 225 001041004832 2000-10-12 46211 105 196430100 3
3 225 001041004832 2000-10-12 52101 105 196430100 3
Upvotes: 1
Reputation: 3987
You can try this:
import pandas as pd
df=pd.DataFrame({"oked_1":["71209","71209","71209"],"oked_2":["01111","46211","52101"]})
print(df)
"""
oked_1 oked_2
0 71209 01111
1 71209 46211
2 71209 52101
"""
df.loc[len(df.index)] = df.loc[len(df.index)-1]
df["aa"]=pd.unique(df[["oked_1","oked_2"]].values.ravel('K'))
print(df)
"""
oked_1 oked_2
0 71209 01111
1 01111 46211
2 46211 52101
3 52101 52101
"""
I don't think I have completely understood you logic but it is giving expected result, as I understand.
Edit: I have tested it with this dataset:
id,date,oked_1,oked_2,KPS,address,type
001041004832,2000-10-12,71209,01111,105,196430100,3
001041004832,2000-10-12,71209,46211,105,196430100,3
001041004832,2000-10-12,71209,52101,105,196430100,3
And the output is:
id date oked_1 oked_2 KPS address type
0 1041004832 2000-10-12 71209 1111 105 196430100 3
1 1041004832 2000-10-12 71209 46211 105 196430100 3
2 1041004832 2000-10-12 71209 52101 105 196430100 3
id date oked_1 oked_2 KPS address type
0 1041004832 2000-10-12 71209 1111 105 196430100 3
1 1041004832 2000-10-12 1111 46211 105 196430100 3
2 1041004832 2000-10-12 46211 52101 105 196430100 3
3 1041004832 2000-10-12 52101 52101 105 196430100 3
And it is working as expected!
Upvotes: 2
Reputation: 501
You can create separate data frames for oked_1 and oked_2 and then drop duplicates & combine the dataframe. As shown below.
df = pd.read_csv(filepath, dtype = str) #this is your main dataframe
df1 = df.drop(columns = ['oked_2']).drop_duplicates(subset=['oked_1'])
df2 = df.drop(columns = ['oked_1']).drop_duplicates(subset=['oked_2']).rename(columns = {'oked_2': 'oked_1'})
data = pd.concat([df1,df2]).reset_index()
print(data)
which looks like this
index id date oked_1 KPS address type
0 0 1041004832 2000-10-12 71209 105 196430100 3
1 0 1041004832 2000-10-12 01111 105 196430100 3
2 1 1041004832 2000-10-12 46211 105 196430100 3
3 2 1041004832 2000-10-12 52101 105 196430100 3
Upvotes: 0