Reputation: 757
I have over 10000 of such files and I am trying to make them as a template
my strings are like this
"MLKT_3C_AAAU_01A"
"MLKT_3C_AALI_01A"
"MLKT_3C_AALJ_01A"
"MLKT_3C_AALK_01A"
"MLKT_4H_AAAK_01A"
I am trying to convert them to this
names(MLKT_3C_AAAU_01A)[2] <- '3C_AAAU_01A' df<- full_join(df,MLKT_CS_4942_01A, by = 'V1')
names(MLKT_3C_AALI_01A)[2] <- '3C_AALI_01A' df<- full_join(df,MLKT_3C_AALI_01A, by = 'V1')
names(MLKT_3C_AALJ_01A)[2] <- '3C_AALJ_01A' df<- full_join(df,MLKT_3C_AALJ_01A, by = 'V1')
names(MLKT_3C_AALK_01A)[2] <- '3C_AALK_01A' df<- full_join(df,MLKT_3C_AALK_01A, by = 'V1')
names(MLKT_4H_AAAK_01A)[2] <- '4H_AAAK_01A' df<- full_join(df,MLKT_4H_AAAK_01A, by = 'V1')
The best way I came across until now was to use a text editor and make them one by one. I am wondering if there is a way in bash to get the above strings and convert it to the example I provided ?
before I start, I remove quotation from each line
sed 's/\"//g' example.txt > exampleout.txt
AT first I try to add names(
at the beging of each line . so lets imagine my file which has all those strings per line is called exampleout.txt. which gives me three time names( instead once
awk '$0="names("$0' exampleout.txt > myout.txt
Then I try to paste )[2] <- '' df<- full_join(df,, by = 'V1')
at the end of each line using the following
sed -e 's/$/)[2] <- '' df<- full_join(df,, by = 'V1') /' myout.txt > myout2.txt
so it led me to this
names(MLKT_3C_AAAU_01A )[2] <- df<- full_join(df,, by = V1)
names(MLKT_3C_AALI_01A)[2] <- df<- full_join(df,, by = V1)
names(MLKT_3C_AALJ_01A )[2] <- df<- full_join(df,, by = V1)
names(MLKT_3C_AALK_01A)[2] <- df<- full_join(df,, by = V1)
names(MLKT_4H_AAAK_01A)[2] <- df<- full_join(df,, by = V1)
Upvotes: 0
Views: 113
Reputation: 8987
You can actually do it all in one command. The script below is similar to sed
, only I've chosen to use perl
to exploit non-greedy matching (.*?_(.*)
) to separate the first underscored field.
perl -pe "s/^\"(.*?_(.*))\"$/names(\1)[2] <- '\2' df <- full_join(df, \1, by 'V1')/" example.txt
Here, I've captured two strings.
For instance, in "MLKT_3C_AAAU_01A"
, the first capture would be MLKT_3C_AAAU_01A
and the second capture would be 3C_AAAU_01A
.
Afterwards, the appropriate substitutions are made.
If the field preceding the first underscore is a constant (e.g. MLKT
), you could use sed
, replacing the non-greedy match with the constant.
sed -E "s/^\"(MLKT_(.*))\"$/names(\1)[2] <- '\2' df <- full_join(df, \1, by 'V1')/" test.txt
Note the use of the -E
flag (for extended regexes/easier group-capturing) and the use of double quotes (for using single-quotes as part of the replacement).
Upvotes: 3
Reputation: 203229
$ awk -F'"' '{
x=$2; sub(/^[^_]+_/,"",x)
printf "names(%s)[2] <- \047%s\047 df<- full_join(df,%s, by = \047V1\047)\n", $2, x, $2
}' file
names(MLKT_3C_AAAU_01A)[2] <- '3C_AAAU_01A' df<- full_join(df,MLKT_3C_AAAU_01A, by = 'V1')
names(MLKT_3C_AALI_01A)[2] <- '3C_AALI_01A' df<- full_join(df,MLKT_3C_AALI_01A, by = 'V1')
names(MLKT_3C_AALJ_01A)[2] <- '3C_AALJ_01A' df<- full_join(df,MLKT_3C_AALJ_01A, by = 'V1')
names(MLKT_3C_AALK_01A)[2] <- '3C_AALK_01A' df<- full_join(df,MLKT_3C_AALK_01A, by = 'V1')
names(MLKT_4H_AAAK_01A)[2] <- '4H_AAAK_01A' df<- full_join(df,MLKT_4H_AAAK_01A, by = 'V1')
Upvotes: 0
Reputation: 189327
Replacing a regex match with something is easily done with sed
.
sed 's/^"\(MLKT_\([^"]*\)\)"$/things with \1 and even \2 in it/' file >newfile
The expression \1
in the replacement text corresponds to the first parenthesized group in the regular expression, and \2
corresponds to the second. So if you matched MLKT_1234
then \1
will be the entire string, and \2
will be 1234
.
If you need single quotes in the replacement, you have to unwrap them somehow. Perhaps the simplest mechanic replacement is to express each literal single quote as '\''
which is a closing single quote for the single-quoted string you are in, then a literal unquoted but backslashed single quote, and then an opening single quote to continue single-quoting the text which follows.
For any nontrivial replacements, though, perhaps you want to investigate Awk, which is somewhat more human-readable.
awk '{ # replace double quotes with nothing
sub(/^"/, ""); sub(/"$/, "");
# Now you can use $0 to refer to the remaining string
# You can replace single quotes with \047
print "names(" $0 ")[2] <- \047" \
substr($0, 6) "\047 df<- full_join(df," \
randomstring ", by = \047V1\047)" }' file >newfile
If randomstring
comes from a second file, there's a common Awk pattern for joining values from two files (google for NR==FNR
).
Upvotes: 2
Reputation: 133458
Could you please try following.
awk -v s1="'" '
match($0,/[a-zA-Z][^"]*/){
val=substr($0,RSTART,RLENGTH)
split(val,array,"_")
print "names(" val"[2] <- " s1 array[2]"_"array[3]"_"array[4] s1 " df<- full_join(df," val", by = " s1 "V1" s1")"
}' Input_file
Output will be as follows.
names(MLKT_3C_AAAU_01A[2] <- '3C_AAAU_01A' df<- full_join(df,MLKT_3C_AAAU_01A, by = 'V1')
names(MLKT_3C_AALI_01A[2] <- '3C_AALI_01A' df<- full_join(df,MLKT_3C_AALI_01A, by = 'V1')
names(MLKT_3C_AALJ_01A[2] <- '3C_AALJ_01A' df<- full_join(df,MLKT_3C_AALJ_01A, by = 'V1')
names(MLKT_3C_AALK_01A[2] <- '3C_AALK_01A' df<- full_join(df,MLKT_3C_AALK_01A, by = 'V1')
names(MLKT_4H_AAAK_01A[2] <- '4H_AAAK_01A' df<- full_join(df,MLKT_4H_AAAK_01A, by = 'V1')
Upvotes: 2