Dienstag, 1. Oktober 2019

Regular Expressions in find bash shell sed grep with examples

find '*.txt'
    metas * bel. lang, matched alles inkl. Leerstring, inkl . , / usw.
          ? genau ein Zeichen lang inkl . , / usw.
    metas beziehen sich nicht auf vorhergehenden Regexp!   
root@osboxes:/var/lib/typo3s/atypo3# find -maxdepth 0 -name '*'
.
root@osboxes:/var/lib/typo3s/atypo3# find -maxdepth 1 -name '*'
.
./uploads
./.htaccess
   
Bash Shell:
meta * beliebig lang, passt auf alles, was nicht mit . beginnt.
        Bezieht sich nicht auf einen vorangegangenen Regexp.
meta ? ein Zeichen lang, passt auf alles außer /
     ? passt auf ., wenn das ? nicht das erste Musterzeichen ist
     ? wird literal interpretiert, wenn es als Muster keinen Treffer macht !!!

root@osboxes:/# echo *
bin boot dev etc home initrd.img initrd.img.old lib lost+found media mnt opt proc root run sbin srv sys tmp usr var vmlinuz vmlinuz.old
root@osboxes:/# echo *.
*.
root@osboxes:/# echo .*
. .. .cache

root@osboxes:/# echo ?
?
root@osboxes:/# echo ??
??
root@osboxes:/# echo ?.
?.
root@osboxes:/# echo .?
..
root@osboxes:/# echo ???
bin dev etc lib mnt opt run srv sys tmp usr var
root@osboxes:/# echo ???/
bin/ dev/ etc/ lib/ mnt/ opt/ run/ srv/ sys/ tmp/ usr/ var/
root@osboxes:/# ls
bin   etc         initrd.img.old  media  proc  sbin  tmp  vmlinuz
boot  home        lib             mnt    root  srv   usr  vmlinuz.old
dev   initrd.img  lost+found      opt    run   sys   var
root@osboxes:/# ls opt/
eclipse  VBoxGuestAdditions-5.2.26  wildfly  wildfly-12.0.0.Final
root@osboxes:/# echo ???????????
vmlinuz.old
root@osboxes:~# echo *
Desktop dnsstart.sh Documents Downloads history20180323_1811Uhr.txt history20190406_1621.txt Music mydumpscript.sh Pictures Public Templates Videos Xauthority
root@osboxes:~# echo */*
Downloads/eclipse-jee-oxygen-3-linux-gtk.tar.gz Downloads/wildfly-12.0.0.Final.tar.gz
root@osboxes:~# echo */.*
Desktop/. Desktop/.. Documents/. Documents/.. Downloads/. Downloads/.. Music/. Music/.. Pictures/. Pictures/.. Public/. Public/.. Templates/. Templates/.. Videos/. Videos/..
root@osboxes:~# echo .*
. .. .bash_history .bashrc .cache .config .dbus .gnupg .ICEauthority .lesshst .local .mozilla .mysql_history .nano .profile .wget-hsts
root@osboxes:~# echo .?*
.. .bash_history .bashrc .cache .config .dbus .gnupg .ICEauthority .lesshst .local .mozilla .mysql_history .nano .profile .wget-hsts
--------------------------------------------------------------------------------

Beispiel:
grep -G: 'abc\|ABC' , '^Anfang\\...\?E$' matches 'Anfang\12E' und 'Anfang\12TE'
grep -E: 'abc|ABC'  , '^Anfang\\...?E$' matches 'Anfang\12E' und 'Anfang\12TE'
grep -P: 'abc|ABC'  , '^Anfang\\...?E$' matches 'Anfang\12E' und 'Anfang\12TE'

-------------------------------------------------------------------
sed:
-E, --regexp-extended, -r : use ext. reg exp like in grep -E

                    . matched auch "\n"
             metas = ^, $, \, ., *   
    Extended metas += ?, +, {, }, (, ), |
    Prios: () > \digit > Repeats >  = \Char > Concat > ^,$ > |
    \n = Newline
    \digit = n-th Bracket expression
    \Char = with char one of { $, *, ., [, \, or ^}
    In Subsitution part of s/search-pat/subst-pat/
    the '&' stands for the whole matched string.
    \digit References can occur in both search-pat and subst-pat in \digit notation. This is different to perl where \digit <-> $digit in search <->subst.
   
Beispiel:
sed normal: '^Das ist  *ein Satz\.\nDer \+naechste Satz\.$'
sed -E    : '^Das ist  *ein Satz\.\nDer +naechste Satz\.$'
sed 'abc*defghi\+jkl' matched auf 'abdefghijkl' und 'abccccdefghiijkl'
sed '\(abc\)*xyz\1\{2\}' matched auf 'abcxyzabcabc' und 'abcabcxyzabcabc'
    !!! nicht aber auf 'xyzabcabc' !!!
root@osboxes:~# sed -n '/\(abc\)*xyz\1\{2\}/p' <xyz
xyzabcabc
abcxyz
abcxyzabc
abcxyzabcabc
abcabcabcabcxyzabcabc
E
abcxyzabcabc
abcabcabcabcxyzabcabc

root@osboxes:~# sed -n '/\(abc\)xyz\1\+7/p' <abcxyzabcabcabc7
abcxyz17
abcxyz117
abcxyzabc7
E
abcxyzabcabcabc7
abcxyzabc7

-------------------------------------------------------------------

grep:
Basic -G default
Extended -E more Metas
Perl -P Perlregexp
    Basic metas     = ^, $, \, ., *
    Extended metas += ?, +, |, {, (, ) # Es wird nicht } mitaufgezählt!!! Testen
    Prios: Repet > Concat > Alternation
    Die Repetitions beziehen sich auf den unmittelbar vorangehenden Regexp
    Repetitions = 
       ?      The preceding item is optional and matched at most once.
       *      The preceding item will be matched zero or more times.
       +      The preceding item will be matched one or more times.
       {n}    The preceding item is matched exactly n times.
       {n,}   The preceding item is matched n or more times.
       {,m}   The preceding item is matched at most m times.  This  is  a  GNU
              extension.
       {n,m}  The  preceding  item  is  matched at least n times, but not more
              than m times.

-------Standard grep restriction-----------------------------------------
https://www.gnu.org/software/grep/manual/grep.html#Regular-Expressions
How can I match across lines?
Standard grep cannot do this, as it is fundamentally line-based. Therefore, merely using the [:space:] character class does not match newlines in the way you might expect.

With the GNU grep option -z (--null-data), each input and output “line” is null-terminated; see Other Options. Thus, you can match newlines in the input, but typically if there is a match the entire input is output, so this usage is often combined with output-suppressing options like -q, e.g.:

printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
If this does not suffice, you can transform the input before giving it to grep, or turn to awk, sed, perl, or many other utilities that are designed to operate across lines.

--------grep -P supports multiline search -------------------------------------
Inspired by https://unix.stackexchange.com/questions/112132/how-can-i-grep-patterns-across-multiple-lines:
Simply a normal grep which supports Perl-regexp parameter P will do this job.

$ echo 'abc blah
blah blah
def blah
blah blah' | grep -oPz  '(?s)abc.*?def'
abc blah
blah blah
def
(?s) called DOTALL modifier which makes dot in your regex to match not only the characters but also the line breaks.

root@osboxes:/var/lib/typo3s/atypo3# find fileadmin/ -name '*.ts' ! -name '*_ori*' -print0 |  xargs -0n1 grep -PHoz  '\bpage\b *= *\bPAGE\b *\n *\bpage\b *\{ *\n?'
fileadmin/templates/mobile/ts/setup.main.ts:page = PAGE
page {
root@osboxes:/var/lib/typo3s/atypo3#

-o: only the matching part of the line will be outputted. Otherwise the match would be shown as the whole file, because each file is treadet as one long line and seperate files are divided by the Null Character.

Keine Kommentare:

Kommentar veröffentlichen