Useful regex expressions for PHP preg_replace

Useful regex expressions for PHP preg_replace

Chances are, if you have worked with PHP (or any coding language) for very long, you have run into regular expressions, aka “regex”. Regular Expressions are patterns that are used to match content within other content.  A regex, for example, can be used to extract all of the URL links in a text document, or to remove multiple spaces, tabs and line returns. There are many many uses, but we won’t go into that here.

Below are a few handy Regular Expressions that can be used for different purposes that you can stick away in your coding notes for future reference. These are using the preg_replace() function, but you can easily pull out the regex and use it elsewhere:

Remove comments from text
Comments staring with # or //:

$text = preg_replace("%(#|(//)).*%", "", $text);

Comments formatted like /* */:

$text = preg_replace("%/\*(?:(?!\*/).)*\*/%s", "", $text);

Replace tabs with a space

$text = preg_replace("/\t/", " ", $text);

Replace repeating spaces with a single space:

$text = preg_replace("/\s+/", " ", $text);

 

If you are trying to apply any of these against a html page or code, you’ll find that it may remove valid spaces, tabs and newline characters from within your <textarea> or other tags.  To prevent that from happening, you first need to convert those characters into a text based string before doing your replace. After the replace, you can convert back from the text string. Example, we would convert all tabs inside of <textarea> tags to be %%tab%%, then you would do your manipulation against your document, and then convert %%tab%% back to an actual tab.  This will protect the tabs within the textarea from being changed.

** Note: there may be a better way of doing this, if so, please share in the comments!

Here is the PHP code:

// Setup our functions that will be needed below
function protect_characters($array){
    $safe=$array[0];
    $safe=preg_replace('/\\n/', "%%newline%%", $safe);
    $safe=preg_replace('/\\t/', "%%tab%%", $safe);
    $safe=preg_replace('/\\s/', "%%space%%", $safe);
    return $safe;
}

function unprotect_characters($array){
    $safe=$array[0];
    $safe=preg_replace('/%%newline%%/', "\\n", $safe);
    $safe=preg_replace('/%%tab%%/', "\\t", $safe);
    $safe=preg_replace('/%%space%%/', " ", $safe);
    return $safe;
}

// First protect spaces and other characters in text areas and tags
$text = preg_replace_callback("/>[^<]*<\\/textarea/i", "protect_characters", $text);
$text = preg_replace_callback('/\\"[^\\"<>]+\\"/', "protect_characters", $text);

— DO YOUR MANIPULATIONS HERE —

// Change back our characters that were protected via the protect routine above
$text = preg_replace_callback('/\\"[^\\"<>]+\\"/', "unprotect_characters", $text);
$text = preg_replace_callback("/>[^<]*<\\/textarea/", "unprotect_characters", $text);

Tags