Reputation: 107
I have a dataframe like this
TEST_NUM SITE_NUM RESULT TEST_FLG TEST_TXT UNITS LO_LIMIT HI_LIMIT
0 100 0 -0.4284 P Continuity_PPMU XSCI V -1 -0.3
1 100 1 -0.4274 P Continuity_PPMU XSCI V -1 -0.3
2 100 2 -0.4276 P Continuity_PPMU XSCI V -1 -0.3
3 100 3 -0.4289 P Continuity_PPMU XSCI V -1 -0.3
4 101 0 -0.4569 P Continuity_PPMU XSCO V -1 -0.3
The TEST_TXT
has 53 unique values.
I want my dataframe to be like this
LDO_Discharge V12 | Continuity_PPMU XSCI | Continuity_PPMU XSCO |Continuity_PPMU ADBUS0 |Continuity_PPMU ADBUS1 ....
1.04 |3.343 |1.91 | 2.1 | 3.1
Basically, all values of RESULT
column of different TEST_TXT
side by side, as a column.
But the trick here is, LDO_Discharge V12
has 5512 values, Continuity_PPMU ADBUS0
has 5528 values. They need to be side-by-side on the basic of SITE_NUM
.
So, First the row of LDO_Discharge V12
with SITE_NUM = 0, should have First row of Continuity_PPMU ADBUS0
with SITE_NUM = 0 and so on. They should joined such that they have same SITE_NUM.
I would have done this easily if the SITE_NUM was unique or their count were equivalent, but it is not so (5512 for 'LDO_Discharge V12' vs 5528 for 'Continuity_PPMU ADBUS0' or other value).
I want to ask how to combine such that "Continuity_PPMU ADBUS0
" SITE_NUM
goes with LDO_Discharge V12
's SITE_NUM
in the order.
And if there are no values for a particular set(say SITE_NUM = 3 is missing for "Continuity_PPMU XSCI
", it is possible as counts are different for different "TEST_TXT
"s) , it should leave a NULL
there.
It was hard to explain like this. Please let me know if some more clarification is required.
Upvotes: 1
Views: 308
Reputation: 520
I didn't really understood what your expected output is supposed to be like, so it might help if you could clarify that. However, from what I gathered, you should take a look at some pandas
functions. Therefore:
Check out axis = 1
argument of pd.concat
, which let's you concatenate in the direction of the rows.
df = pd.concat(iterator, axis = 1) # returns a DataFrame
But maybe you want to check out pd.DataFrame.groupby
, and then pd.DataFrame.agg
and after grouping you should take a look at pd.DataFrame.sort_values
, which would be something like:
df = pd.DataFrame()
gdf = df.groupby(by = columns_to_group).agg({column_agg: function_agg, ...}) # returns DataFrame after `agg`
Upvotes: 1