Uldana Duisenaly
Uldana Duisenaly

Reputation: 13

How to move data from one column to another?

I have following data:

            id                 date   oked_1     oked_2  KPS  address    type
225  001041004832         2000-10-12  71209     01111  105  196430100    3
225  001041004832         2000-10-12  71209     46211  105  196430100    3
225  001041004832         2000-10-12  71209     52101  105  196430100    3

i need to move "oked_2" to "oked_1" in a way that all other columns have to replicated. For example, below you can see how oked_2 values are copied to oked_1, while other column data are the same. I want to have only oked_1 for my final dataframe(all oked_2 data have to be moved to oked_1).I expect:

               id                 date   oked_1     oked_2  KPS  address    type
    225  001041004832         2000-10-12  71209     01111  105  196430100    3
    225  001041004832         2000-10-12  01111     46211  105  196430100    3
    225  001041004832         2000-10-12  46211     52101  105  196430100    3
    225  001041004832         2000-10-12  52101     52101  105  196430100    3

How can I do that? I have not tried, because I do not have any clue how to process it...

If you see the expected dataframe, you can clearly notice that values from oked_2 are copied to oked_1. Furthermore, because one row was added because there was 3 different values in oked_2 and one was in oked_1. Total 4 unique values.

Upvotes: 0

Views: 83

Answers (3)

sarrysyst
sarrysyst

Reputation: 247

from io import StringIO
import pandas as pd

data = """
_            id                 date   oked_1     oked_2  KPS  address    type
225  001041004832         2000-10-12  71209     01111  105  196430100    3
225  001041004832         2000-10-12  71209     46211  105  196430100    3
225  001041004832         2000-10-12  71209     52101  105  196430100    3
"""

df = pd.read_csv(StringIO(data), dtype=str, delim_whitespace=True)

df['oked_1'] = df[['oked_1', 'oked_2']].to_numpy().tolist()

df = (df.explode('oked_1')
        .drop_duplicates('oked_1', ignore_index=True)
        .drop('oked_2', axis=1)
     )

Output for df:

     _            id        date oked_1  KPS    address type
0  225  001041004832  2000-10-12  71209  105  196430100    3
1  225  001041004832  2000-10-12  01111  105  196430100    3
2  225  001041004832  2000-10-12  46211  105  196430100    3
3  225  001041004832  2000-10-12  52101  105  196430100    3

Upvotes: 1

imxitiz
imxitiz

Reputation: 3987

You can try this:

import pandas as pd

df=pd.DataFrame({"oked_1":["71209","71209","71209"],"oked_2":["01111","46211","52101"]})

print(df)
"""
   oked_1 oked_2
0  71209  01111
1  71209  46211
2  71209  52101
"""
df.loc[len(df.index)] = df.loc[len(df.index)-1] 
  
df["aa"]=pd.unique(df[["oked_1","oked_2"]].values.ravel('K'))
print(df)
"""
  oked_1  oked_2
0  71209  01111
1  01111  46211
2  46211  52101
3  52101  52101
"""

I don't think I have completely understood you logic but it is giving expected result, as I understand.

Edit: I have tested it with this dataset:

id,date,oked_1,oked_2,KPS,address,type
001041004832,2000-10-12,71209,01111,105,196430100,3
001041004832,2000-10-12,71209,46211,105,196430100,3
001041004832,2000-10-12,71209,52101,105,196430100,3

And the output is:

           id        date  oked_1  oked_2  KPS    address  type
0  1041004832  2000-10-12   71209    1111  105  196430100     3
1  1041004832  2000-10-12   71209   46211  105  196430100     3
2  1041004832  2000-10-12   71209   52101  105  196430100     3

           id        date  oked_1  oked_2  KPS    address  type
0  1041004832  2000-10-12   71209    1111  105  196430100     3
1  1041004832  2000-10-12    1111   46211  105  196430100     3
2  1041004832  2000-10-12   46211   52101  105  196430100     3
3  1041004832  2000-10-12   52101   52101  105  196430100     3

And it is working as expected!

Upvotes: 2

Pavan Suvarna
Pavan Suvarna

Reputation: 501

You can create separate data frames for oked_1 and oked_2 and then drop duplicates & combine the dataframe. As shown below.

df = pd.read_csv(filepath, dtype = str) #this is your main dataframe

df1 = df.drop(columns = ['oked_2']).drop_duplicates(subset=['oked_1'])
df2 = df.drop(columns = ['oked_1']).drop_duplicates(subset=['oked_2']).rename(columns = {'oked_2': 'oked_1'})

data = pd.concat([df1,df2]).reset_index()
print(data)

which looks like this

   index          id        date oked_1  KPS    address type
0      0  1041004832  2000-10-12  71209  105  196430100    3
1      0  1041004832  2000-10-12  01111  105  196430100    3
2      1  1041004832  2000-10-12  46211  105  196430100    3
3      2  1041004832  2000-10-12  52101  105  196430100    3

Upvotes: 0

Related Questions