There are free online tool to generate sitemap.xml, but the one I used would only index 500 pages. My site www.techinterchange.com.au has more, so I decide to write a bash script which ill index the site and generate an sitemap.xml.
Its simple, ill continue to add features and make it more reliable. This is only working with WordPress atm, Ill improve with the next revision.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
#!/bin/bash url=$1 linklist="/tmp/linklist.txt" urlcleaned="/tmp/url_clean.txt" endresult="sitemap.xml" date=$(date +%Y-%m-%dT%H:%M:%S+00:00) wget --spider --recursive --level=100000 --no-verbose --output-file=${linklist} ${url} cat ${linklist} | grep URL | awk -F 'URL:' '{print $2}'| grep "$url"|grep "200 OK" | grep -v "oembed" | sed 's#feed/ 200 OK##g'| grep -v "wp-includes" | grep -v "wp-content"| grep -v "wp-admin"| grep -v "wp-json" |grep -v "xmlrpc" | sort -u > ${urlcleaned} echo '<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">' > ${endresult} for i in `cat ${urlcleaned}` do echo "<url>" >> ${endresult} echo "<loc>$i</loc>" >> ${endresult} echo "<lastmod>${date}</lastmod>" >> ${endresult} echo "</url>" >> ${endresult} done echo "</urlset>" >> ${endresult} |