dough
dough

Reputation: 13

How to flatten multiple column and rows in dataframe?

Here's my excel data:

Main Topic
Sr No column 1 column 2
Sr No sub-col1 sub-col2 sub-col3 sub-col4 sub-col1 sub-col2 sub-col3 sub-col4
First Topic
1) Sub Topic-1
1 107 207 307 407 507 607 707 807
2 108 208 308 408 508 608 708 808
3 109 209 309 409 509 609 709 809
4 110 210 310 410 510 610 710 810
2) Sub Topic-2
1 113 213 313 413 513 613 713 813
2 114 214 314 414 514 614 714 814
3 115 215 315 415 515 615 715 815
4 116 216 316 416 516 616 716 816
Second Topic
1) Sub Topic-1
1 120 220 320 420 520 620 720 820
2 121 221 321 421 521 621 721 821
3 122 222 322 422 522 622 722 822
4 123 223 323 423 523 623 723 823
2) Sub Topic-2
1 126 226 326 426 526 626 726 826
2 127 227 327 427 527 627 727 827
3 128 228 328 428 528 628 728 828
4 129 229 329 429 529 629 729 829

image1.png

enter image description here

Now, I want to change the data (because my data modeling guy said to make it this way before adding to sql) and I don't know how to change it to such dataframe.

The change is shown in below image:

image2.png

enter image description here

P.S. Note: This is dummy data and the length of rows and columns always vary. Any help me appreciated.

Upvotes: 0

Views: 763

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31166

This is just about being systematic

  • as your "data modeller" has correctly identified, the source data is not fully structured.
  • work out your row and column multi-indexes and then it is structured
src = """| Main Topic     |          |          |          |          |          |          |          |          |
|----------------|----------|----------|----------|----------|----------|----------|----------|----------|
| Sr No          | column 1 |          |          |          | column 2 |          |          |          |
| Sr No          | sub-col1 | sub-col2 | sub-col3 | sub-col4 | sub-col1 | sub-col2 | sub-col3 | sub-col4 |
| First Topic    |          |          |          |          |          |          |          |          |
| 1) Sub Topic-1 |          |          |          |          |          |          |          |          |
| 1              | 107      | 207      | 307      | 407      | 507      | 607      | 707      | 807      |
| 2              | 108      | 208      | 308      | 408      | 508      | 608      | 708      | 808      |
| 3              | 109      | 209      | 309      | 409      | 509      | 609      | 709      | 809      |
| 4              | 110      | 210      | 310      | 410      | 510      | 610      | 710      | 810      |
|                |          |          |          |          |          |          |          |          |
| 2) Sub Topic-2 |          |          |          |          |          |          |          |          |
| 1              | 113      | 213      | 313      | 413      | 513      | 613      | 713      | 813      |
| 2              | 114      | 214      | 314      | 414      | 514      | 614      | 714      | 814      |
| 3              | 115      | 215      | 315      | 415      | 515      | 615      | 715      | 815      |
| 4              | 116      | 216      | 316      | 416      | 516      | 616      | 716      | 816      |
|                |          |          |          |          |          |          |          |          |
| Second Topic   |          |          |          |          |          |          |          |          |
| 1) Sub Topic-1 |          |          |          |          |          |          |          |          |
| 1              | 120      | 220      | 320      | 420      | 520      | 620      | 720      | 820      |
| 2              | 121      | 221      | 321      | 421      | 521      | 621      | 721      | 821      |
| 3              | 122      | 222      | 322      | 422      | 522      | 622      | 722      | 822      |
| 4              | 123      | 223      | 323      | 423      | 523      | 623      | 723      | 823      |
|                |          |          |          |          |          |          |          |          |
| 2) Sub Topic-2 |          |          |          |          |          |          |          |          |
| 1              | 126      | 226      | 326      | 426      | 526      | 626      | 726      | 826      |
| 2              | 127      | 227      | 327      | 427      | 527      | 627      | 727      | 827      |
| 3              | 128      | 228      | 328      | 428      | 528      | 628      | 728      | 828      |
| 4              | 129      | 229      | 329      | 429      | 529      | 629      | 729      | 829      |"""

# uniform grid of data...
df = (pd.read_csv(io.StringIO(src), sep="|", header=None).drop(columns=[0,10])
 .applymap(str.strip).replace("",np.nan).dropna(how="all").drop(1)
 .reset_index(drop=True)
 .fillna(method="ffill", axis=1)
)

# first 3 rows are column names.. of which first is the main topic
df2 = df.iloc[3:].copy()
# extract topic and sub-topic from relevent rows, put onto other rows with fillna()
df2 = (df2.assign(st=df2[1].str.contains(r"^\d+\).*\-\d+$"),
          t=lambda dfa: ~dfa.st & dfa[1].str.contains("^[A-Z,a-z, ]+"),
          tk=lambda dfa: np.where(dfa.t, dfa[1], np.nan),
          stk=lambda dfa: np.where(dfa.st, dfa[1], np.nan))
 .fillna(method="ffill")
)

# remove rows that contained topic & sub-topic, plus working columns
# topic & sub-topic are part of the index
df2 = df2.drop(df2.loc[df2.t | df2.st].index).drop(columns=["t","st"]).set_index(["tk","stk"])
# back to first 3 rows are column names, 1st being main topic
df2.columns = pd.MultiIndex.from_arrays(df.iloc[0:3].values, names=["topic","col","sub-col"])
# move SR No into index, then stack first level of column multi-index into row index
# cleanup order of items in the index
df2 = df2.set_index(df2.columns[0], append=True).stack(0).reorder_levels([3,0,1,2])

# cleanup index name
df2.reindex(df2.index.set_names("SR No",level=3))

output

col                                          column 1                            column 2                           
sub-col                                      sub-col1 sub-col2 sub-col3 sub-col4 sub-col1 sub-col2 sub-col3 sub-col4
topic      tk           stk            SR No                                                                        
Main Topic First Topic  1) Sub Topic-1 1          107      207      307      407      507      607      707      807
                                       2          108      208      308      408      508      608      708      808
                                       3          109      209      309      409      509      609      709      809
                                       4          110      210      310      410      510      610      710      810
                        2) Sub Topic-2 1          113      213      313      413      513      613      713      813
                                       2          114      214      314      414      514      614      714      814
                                       3          115      215      315      415      515      615      715      815
                                       4          116      216      316      416      516      616      716      816
           Second Topic 1) Sub Topic-1 1          120      220      320      420      520      620      720      820
                                       2          121      221      321      421      521      621      721      821
                                       3          122      222      322      422      522      622      722      822
                                       4          123      223      323      423      523      623      723      823
                        2) Sub Topic-2 1          126      226      326      426      526      626      726      826
                                       2          127      227      327      427      527      627      727      827
                                       3          128      228      328      428      528      628      728      828
                                       4          129      229      329      429      529      629      729      829

Upvotes: 1

Related Questions