Sanghamitra Lahiri
Sanghamitra Lahiri

Reputation: 321

Create a dataset from two different dataset

I have two different data set:

1. state    VDM MDM OM
      AP    1   2   5
     GOA    1   2   1
      GU    1   2   4
      KA    1   5   1

2. Attribute:Value     Support     Item
             VDM:1     4           1
             VDM:2     0           2
             VDM:3     0           3
             VDM:4     0           4
             VDM:5     0           5
             MDM:1     0           6
             MDM:2     3           7
             MDM:3     0           8
             MDM:4     0           9
             MDM:5     1           10
              OM:1     2           11
              OM:2     0           12
              OM:3     0           13
              OM:4     1           14
              OM:5     1           15

The first dataset only contains 1-5 values. The second dataset holds the Attribute:Value pair and it's occurrences and a sequence number (Item).

I want a Dataset which looks like below:

state      Item Number
   AP      1, 7, 15
  GOA      1, 7, 11
   GU      1, 7, 14
   KA      1, 10, 11

Upvotes: 3

Views: 125

Answers (3)

piRSquared
piRSquared

Reputation: 294498

None of these are really appealing to me. But sometimes you just have to thrash about to get your data munged.

Attempt #0

a = dict(zip(df2['Attribute:Value'], df2['Item']))
cols = ['VDM', 'MDM', 'OM']
b = {
    'Item Number':
    [', '.join([str(a[f'{c}:{t._asdict()[c]}']) for c in cols]) for t in df1.itertuples()]
}

df1[['state']].assign(**b)

  state Item Number
0    AP    1, 7, 15
1   GOA    1, 7, 11
2    GU    1, 7, 14
3    KA   1, 10, 11

Attempt #1

a = dict(zip(df2['Attribute:Value'], df2['Item'].astype(str)))
d1 = df1.set_index('state').astype(str)
r1 = (d1.columns + ':' + d1).replace(a)  # Thanks @anky_91
# r1 = (d1.columns + ':' + d1).applymap(a.get)
r1

      VDM MDM  OM
state            
AP      1   7  15
GOA     1   7  11
GU      1   7  14
KA      1  10  11

Then

pd.DataFrame({'state': r1.index, 'Item Number': [*map(', '.join, zip(*map(r1.get, r1)))]})

  state Item Number
0    AP    1, 7, 15
1   GOA    1, 7, 11
2    GU    1, 7, 14
3    KA   1, 10, 11

Attempt #2

a = dict(zip(df2['Attribute:Value'], df2['Item'].astype(str)))
cols = ['VDM', 'MDM', 'OM']
b = {
    'Item Number':
    [*map(', '.join, zip(*[[a[f'{c}:{i}'] for i in df1[c]] for c in cols]))]
}

df1[['state']].assign(**b)

  state Item Number
0    AP    1, 7, 15
1   GOA    1, 7, 11
2    GU    1, 7, 14
3    KA   1, 10, 11

Attempt #3

from itertools import cycle

a = dict(zip(zip(*df2['Attribute:Value'].str.split(':').str), df2['Item'].astype(str)))
d = df1.set_index('state')
b = {
    'Item Number':
    [*map(', '.join, zip(*[map(a.get, zip(cycle(d), np.ravel(d).astype(str)))] * 3))]
}

df1[['state']].assign(**b)

  state Item Number
0    AP    1, 7, 15
1   GOA    1, 7, 11
2    GU    1, 7, 14
3    KA   1, 10, 11

Attempt #4

a = pd.Series(dict(zip(
    zip(*df2['Attribute:Value'].str.split(':').str),
    df2.Item.astype(str)
)))

df1.set_index('state').stack().astype(str).groupby(level=0).apply(
    lambda s: ', '.join(map(a.get, s.xs(s.name).items()))
).reset_index(name='Item Number')

  state Item Number
0    AP    1, 7, 15
1   GOA    1, 7, 11
2    GU    1, 7, 14
3    KA   1, 10, 11

Upvotes: 5

BENY
BENY

Reputation: 323346

I feel like this is merge and pivot problem

s=df2['Attribute:Value'].str.split(':',expand=True).assign(Item=df2.Item)
s[1]=s[1].astype(int)
s1=df1.melt('state')


s1.merge(s,right_on=[0,1],left_on=['variable','value']).pivot('state','variable','Item')
Out[113]: 
variable  MDM  OM  VDM
state                 
AP          7  15    1
GOA         7  11    1
GU          7  14    1
KA         10  11    1

Upvotes: 3

Chris Adams
Chris Adams

Reputation: 18647

Here is another approach using stack, map and unstack:

s = df.set_index('state').stack()
s_map = df2.set_index(['Attribute:Value'])['Item']
s.loc[:] = (s.index.get_level_values(1) + ':' + s.astype(str)).map(s_map)
s.unstack().astype(str).apply(', '.join, axis=1).reset_index(name='Item Number')

[out]

  state Item Number
0    AP    1, 7, 15
1   GOA    1, 7, 11
2    GU    1, 7, 14
3    KA   1, 10, 11

Upvotes: 3

Related Questions