Reputation: 679
I would like to duplicate the values that are separated by |
, transforming them into new columns.
The following subset example file would be:
1_A 2_A 3_A
1|0 0|0 0|0
0|0 0|0 1|1
1|1 1|0 1|0
0|1 1|1 0|0
0|1 1|0 0|0
0|0 0|0 0|0
0|1 1|1 0|1
0|0 0|0 1|0
1|1 1|0 0|1
0|0 0|0 0|0
The desired format would be:
1_A 1_B 2_A 2_B 3_A 3_B
1|1 0|0 0|0 0|0 0|0 0|0
0|0 0|0 0|0 0|0 1|1 1|1
1|1 1|1 1|1 0|0 1|1 0|0
0|0 1|1 1|1 1|1 0|0 0|0
0|0 1|1 1|1 0|0 0|0 0|0
0|0 0|0 0|0 0|0 0|0 0|0
0|0 1|1 1|1 1|1 0|0 1|1
0|0 0|0 0|0 0|0 1|1 0|0
1|1 1|1 1|1 0|0 0|0 1|1
0|0 0|0 0|0 0|0 0|0 0|0
On the example above, the 1_B sample is the duplicate values to the right of the pipe character from the 1_A sample.
Thanks for any lead.
Upvotes: 0
Views: 127
Reputation: 37464
Yet another in awk:
$ awk '{
for(i=1;i<=NF;i++)
if(NR==1)
sub(/A/,"& " i "_B",$i)
else {
t=$i
sub(/\|/," ",t)
sub(/\|/,"|"t"|",$i)
}
}1' file
head -3
of output:
1_A 1_B 2_A 2_B 3_A 3_B
1|1 0|0 0|0 0|0 0|0 0|0
0|0 0|0 0|0 0|0 1|1 1|1
...
And one in sed
:
$ sed '1s/\(._\)A/\1A \1B/g; s/\([01]\)|\([01]\)/\1|\1 \2|\2/g' file
Output:
1_A 1_B 2_A 2_B 3_A 3_B
1|1 0|0 0|0 0|0 0|0 0|0
0|0 0|0 0|0 0|0 1|1 1|1
...
Upvotes: 3
Reputation: 67567
awk
to the rescue!
$ awk 'NR==1 {for(i=1;i<=NF;i++)
{t=$i;
sub("A","B",$i);
printf "%s %s ",t,$i}
print ""; next}
{for(i=1;i<=NF;i++)
{split($i,a,"|");
printf "%s %s ",a[1]"|"a[1],a[2]"|"a[2]}
print ""}' file
1_A 1_B 2_A 2_B 3_A 3_B
1|1 0|0 0|0 0|0 0|0 0|0
0|0 0|0 0|0 0|0 1|1 1|1
1|1 1|1 1|1 0|0 1|1 0|0
0|0 1|1 1|1 1|1 0|0 0|0
0|0 1|1 1|1 0|0 0|0 0|0
0|0 0|0 0|0 0|0 0|0 0|0
0|0 1|1 1|1 1|1 0|0 1|1
0|0 0|0 0|0 0|0 1|1 0|0
1|1 1|1 1|1 0|0 0|0 1|1
0|0 0|0 0|0 0|0 0|0 0|0
Upvotes: 3
Reputation: 133780
Could you please try following.
awk -F"[ |]" '
FNR==1{
print $1,"1_B",$2,"2_B",$3,"3_B"
next
}
FNR>1{
for(i=1;i<=NF;i++){
$i=$i "|" $i
}
}
1
' Input_file
Explanation: Adding explanation for above code.
awk -F"[ |]" ' ##Setting field separator eiter space or pipe here.
FNR==1{ ##Checking condition if this is first line.
print $1,"1_B",$2,"2_B",$3,"3_B" ##Printing headers as per OP.
next ##Using next will skip all further statements from here.
} ##Closing FNR==1 condition BLOCK here.
FNR>1{ ##Checking condition if FNR>1 then do following.
for(i=1;i<=NF;i++){ ##Starting a for loop from i=1 to till value of NF(number of fields in current line).
$i=$i "|" $i ##Setting value of current field to current field |(pipe) current field value here.
} ##Closing BLOCK for for loop here.
} ##Closing BLOCK for FNR>1 condition here.
1 ##Mentioning 1 will print edited/non-edited current line here.
' Input_file ##Mentioning Input_file(which we need to process here).
Output will be as follows.
1_A 1_B 2_A 2_B 3_A 3_B
1|1 0|0 0|0 0|0 0|0 0|0
0|0 0|0 0|0 0|0 1|1 1|1
1|1 1|1 1|1 0|0 1|1 0|0
0|0 1|1 1|1 1|1 0|0 0|0
0|0 1|1 1|1 0|0 0|0 0|0
0|0 0|0 0|0 0|0 0|0 0|0
0|0 1|1 1|1 1|1 0|0 1|1
0|0 0|0 0|0 0|0 1|1 0|0
1|1 1|1 1|1 0|0 0|0 1|1
0|0 0|0 0|0 0|0 0|0 0|0
Upvotes: 4
Reputation: 3908
This should work:
def split_values(s):
# split the string over '|', and format each to the expected output
return [*map(lambda _: f"{_}|{_}", s.split('|'))]
# list of dataframes which we will later concat over
tmp_df_lst = []
for col in df.columns:
# apply split_values over each column
tmp_df = pd.DataFrame(df[col].apply(split_values).values.tolist(),
columns=[f"{col[:-1]}A", f"{col[:-1]}B"])
tmp_df_lst.append(tmp_df)
# result
pd.concat(tmp_df_lst, axis=1)
Output:
1_A 1_B 2_A 2_B 3_A 3_B
0 1|1 0|0 0|0 0|0 0|0 0|0
1 0|0 0|0 0|0 0|0 1|1 1|1
2 1|1 1|1 1|1 0|0 1|1 0|0
3 0|0 1|1 1|1 1|1 0|0 0|0
4 0|0 1|1 1|1 0|0 0|0 0|0
5 0|0 0|0 0|0 0|0 0|0 0|0
6 0|0 1|1 1|1 1|1 0|0 1|1
7 0|0 0|0 0|0 0|0 1|1 0|0
8 1|1 1|1 1|1 0|0 0|0 1|1
9 0|0 0|0 0|0 0|0 0|0 0|0
Upvotes: 1