PHP 정규식 사용하기

페이지 정보

작성자 최고관리자 댓글 0건 조회 3,469회 작성일 18-09-13 11:32

본문

정규식 문자열 내 이미지,링크 추출하기

// a 링크만 추출하기

preg_match_all("|<a[^>]+>(.*)</a>|U",$str,$out1, PREG_PATTERN_ORDER);

preg_match_all("|<a[^>]+>.*</a>|U",$str,$out2, PREG_PATTERN_ORDER);

preg_match_all("^<a.*</a>^U", $str, $out3);

// http 로 시작하는 것만추출

preg_match_all("((http)://[a-z0-9-]+.[][a-zA-Z0-9:&@=_~%;?/.+-]+)",$str,$out4, PREG_PATTERN_ORDER);

// 이미지만 추출

preg_match_all("/<img[^>]*src=["]?([^>"]+)["]?[^>]*>/i", $str, $out5);

echo "<pre>";

print_r ($out1);

preg_match("((http)://[a-z0-9-]+.[][a-zA-Z0-9:&@=_~%;?/.+-]+)", $item[description], $match);// http로 시작하는 것만 추출
preg_match("/<img[^>]*src="[]?([^>"]+)["]?[^>]*>/i",$item[description],$match);//이미지만 추출
preg_match("/http://.*.(jp[e]?g|gif|png)/Ui",$item[description],$match);//처음 이미지만 추출

RSS뷰어제작시 본문에서 이미지 추출만 추출할때 사용.

<a> tag에서 url 부분만 추출하고 <a> tag를 날려 버리는 정규식

$content = preg_replace(<a href="([^" ]*)"[^.]*="">S*</a>i, "\1 ", $content);

출처: http://banasun.tistory.com/entry/php-a-href-tag에서-url만-추출하고-a-tag-제거 [banana like sunshine]

정규식을 이용해서 게시물 내용에서 img 태그 전체 또는 src 값만 추출할 수 있다.

예제>

$contents = "<img src=http://okkks.tistory.com/img/2015/07/01/okkks.tisotry.com_01.jpg alt=okkks.tisotry.com_01.jpg /><br /><img src=http://okkks.tistory.com/img/2015/07/01/okkks.tisotry.com_02.jpg />";

// 정규식을 이용해서 img 태그 전체 / src 값만 추출하기
preg_match_all("/<img[^>]*src=["]?([^>"]+)["]?[^>]*>/i", $contents, $matches);

// img 태그 전체 추출하기

print_r($matches[0]);

// src 값만 추출하기

print_r($matches[1]);

// 이미지 태그 src 값 중에서 "img" 문자열 이하 값 알아내기

$ary_rtn = array();

foreach($matches[1] as $k => $v) {
$t = explode("img", $v);
array_push($ary_rtn, $t[1]);
}

echo "<br />";
var_dump($ary_rtn);

// 결과

출처: http://okkks.tistory.com/1078 [이건없지]

TEXTATEA 제거
$content = preg_replace("!<TEXTAREA(.*?)>!is","[TEXTAREA]",$content);
$content = preg_replace("!</TEXTAREA(.*?)>!is","[/TEXTAREA]",$content);

script 제거
$str=preg_replace("!<script(.*?)</script>!is","",$str);

iframe 제거
$str=preg_replace("!<iframe(.*?)</iframe>!is","",$str);

meta 제거
$str=preg_replace("!<meta(.*?)>!is","",$str);

style 태그 제거
$str=preg_replace("!<style(.*?)</style>!is","",$str);

를 공백으로 변환
$str=str_replace(" "," ",$str);

연속된 공백 1개로
$str=preg_replace("/s{2,}/"," ",$str);

태그안에 style= 속성 제거
$str=preg_replace("/ zzstyle=([^"]+) /"," ",$str); // style=border:0... 따옴표가 없을때
$str=preg_replace("/ style=("|)?([^"]+)("|)?/","",$str); // style="border:0..." 따옴표 있을때

태그안의 width=, height= 속성 제거
$str=preg_replace("/ width=("|)?d+("|)?/","",$str);
$str=preg_replace("/ height=("|)?d+("|)?/","",$str);

img 태그 추출 src 추출
preg_match("/<img[^>]*src=["]?([^>"]+)["]?[^>]*>/i",$str,$RESULT);
preg_match_all("/<img[^>]*src=["]?([^>"]+)["]?[^>]*>/i",$str,$RESULT);

호스트 추출
<?
preg_match("/^(http://)?([^/]+)/i","http://www.naver.com/index.php",$matches);
$host = $matches[2];
echo$matches[0]."<br>";
echo$matches[1]."<br>";
echo$matches[2]."<br>";
?>

댓글목록

등록된 댓글이 없습니다.

PHP 정규식 사용하기 > PHP

페이지 정보

본문

정규식 문자열 내 이미지,링크 추출하기

댓글목록